One-stream coding for asymmetric stereo video

ABSTRACT

An asymmetric frame of a coded video bitstream may include a first resolution picture of a left view and a reduced resolution picture of a right view, where the left and right views form a stereo view pair for three-dimensional video playback. In addition, the reduced resolution frame may be predicted relative to a picture of the left view. In one example, an apparatus includes a video encoder configured to encode a first picture of a first view of a scene to produce an encoded picture with a first resolution, encode at least a portion of a second picture of a second view of the scene relative to a reference picture of the first view to produce an encoded picture with a reduced resolution relative to the first resolution, and output the encoded first resolution picture and the encoded reduced resolution picture in a common bitstream.

This application claims the benefit of U.S. Provisional Application No.61/334,253, filed May 13, 2010, U.S. Provisional Application No.61/366,436, filed Jul. 21, 2010, and U.S. Provisional Application No.61/433,110, filed on Jan. 14, 2011, each of which is hereby incorporatedby reference in its entirety.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to the following co-pending U.S. PatentApplication:

-   “FRAME PACKING FOR ASYMMETRIC STEREO VIDEO” by Ying Chen et al.,    having Attorney Docket No. 101116, filed concurrently herewith,    assigned to the assignee hereof, and expressly incorporated by    reference herein.

TECHNICAL FIELD

This disclosure relates to video coding.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, digital cameras, digital recording devices,digital media players, video gaming devices, video game consoles,cellular or satellite radio telephones, video teleconferencing devices,and the like. Digital video devices implement video compressiontechniques, such as those described in the standards defined by MPEG-2,MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced VideoCoding (AVC), and extensions of such standards, to transmit and receivedigital video information more efficiently.

Video compression techniques perform spatial prediction and/or temporalprediction to reduce or remove redundancy inherent in video sequences.For block-based video coding, a video frame or slice may be partitionedinto macroblocks. Each macroblock can be further partitioned.Macroblocks in an intra-coded (I) frame or slice are encoded usingspatial prediction with respect to neighboring macroblocks. Macroblocksin an inter-coded (P or B) frame or slice may use spatial predictionwith respect to neighboring macroblocks in the same frame or slice ortemporal prediction with respect to other reference frames.

Efforts have been made to develop new video coding standards based onH.264/AVC. One such standard is the scalable video coding (SVC)standard, which is the scalable extension to H.264/AVC. Another standardis the multi-view video coding (MVC), which has become the multiviewextension to H.264/AVC. A joint draft of MVC is in described inJVT-AB204, “Joint Draft 8.0 on Multiview Video Coding,” 28^(th) JVTmeeting, Hannover, Germany, July 2008, available athttp://wftp3.itu.int/av-arch/jvt-site/2008_(—)07_Hannover/JVT-AB204.zip.A version of the AVC standard is described in JVT-AD007, “Editors' draftrevision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding—inpreparation for ITU-T SG 16 AAP Consent (in integrated form),” 30th JVTmeeting, Geneva, CH, February 2009,” available fromhttp://wftp3.itu.int/av-arch/jvt-site/2009_(—)01_Geneva/JVT-AD007.zip.The JVT-AD007document integrates SVC and MVC in the AVC specification.

SUMMARY

In general, this disclosure describes techniques for supporting stereovideo data, e.g., video data used to produce a three-dimensional (3D)effect. To produce a three-dimensional effect in video, two views of ascene, e.g., a left eye view and a right eye view, are shownsimultaneously or nearly simultaneously. The techniques of thisdisclosure include forming a bitstream having packed frames, where apacked frame corresponds to a single frame having data for two views ofa scene. In particular, the techniques of this disclosure includeencoding a packed frame having a full resolution frame of one view of ascene and a reduced resolution frame of another view of the scene. Thereduced resolution frame may be encoded with respect to a frame of theother view. In this manner, this disclosure also provides techniques forperforming inter-view prediction for a reduced resolution frame of apacked frame.

In one example, a method includes receiving a first picture of a firstview of a scene having a first resolution, receiving a second picture ofa second view of the scene having a reduced resolution relative to thefirst resolution, forming an asymmetric frame comprising the firstresolution picture and the reduced resolution picture, encoding theasymmetric frame, and outputting the asymmetric frame.

In another example, an apparatus for encoding video data includes avideo encoder configured to receive a first picture of a first view of ascene having a first resolution, receive a second picture of a secondview of the scene having a reduced resolution relative to the firstresolution, form an asymmetric frame comprising the first picture andthe second picture, and encode the asymmetric frame.

In another example, an apparatus includes means for receiving a firstpicture of a first view of a scene having a first resolution, means forreceiving a second picture of a second view of the scene having areduced resolution relative to the first resolution, means for formingan asymmetric frame comprising the first picture and the second picture,and means for encoding the asymmetric frame.

In another example, a computer program product includes acomputer-readable storage medium having stored thereon instructionsthat, when executed, cause a processor to receive a first picture of afirst view of a scene having a first resolution, receive a secondpicture of a second view of the scene having a reduced resolutionrelative to the first resolution, form an asymmetric frame comprisingthe first picture and the second picture, encode the asymmetric frame,and output the encoded asymmetric frame.

In another example, a method includes receiving an encoded asymmetricframe comprising a first resolution picture of a first view of a sceneand a reduced resolution picture of a second view of the scene, whereinthe reduced resolution picture has a reduced resolution relative to thefirst resolution, decoding the asymmetric frame, separating the decodedasymmetric frame into the first resolution picture and the reducedresolution picture, upsampling the reduced resolution picture to producea second picture of the scene having the first resolution, andoutputting the first picture and the second picture, wherein the firstpicture and the second picture form a stereo image pair.

In another example, an apparatus includes a video decoder configured toreceive an encoded asymmetric frame comprising a first resolutionpicture of a first view of a scene and a reduced resolution picture of asecond view of the scene, wherein the reduced resolution picture has areduced resolution relative to the first resolution, decode theasymmetric frame, separate the decoded asymmetric frame into the firstresolution picture and the the reduced resolution picture, and upsamplethe reduced resolution picture to produce a second picture of the scenehaving the first resolution, wherein the first decoded picture and thesecond decoded picture form a stereo image pair.

In another example, an apparatus includes means for receiving anasymmetric frame comprising a first resolution picture of a first viewof a scene and a reduced resolution picture of a second view of thescene, wherein the reduced resolution picture has a reduced resolutionrelative to the first resolution, means for decoding the asymmetricframe, means for separating the decoded asymmetric frame into the firstresolution picture and the reduced resolution picture, and means forupsampling the reduced resolution picture to produce a second picture ofthe scene having the first resolution, wherein the first decoded pictureand the second decoded picture form a stereo image pair.

In another example, a computer program product includes acomputer-readable storage medium having stored thereon instructionsthat, when executed, cause a processor to receive an asymmetric framecomprising a first resolution picture of a first view of a scene and areduced resolution picture of a second view of the scene, wherein thereduced resolution picture has a reduced resolution relative to thefirst resolution, decode the asymmetric frame, separate the decodedasymmetric frame into the first resolution picture and the reducedresolution picture, upsample the reduced resolution picture to produce asecond picture of the scene with the first resolution, and output thefirst picture and the second picture, wherein the first picture and thesecond picture form a stereo image pair.

In another example, a method includes encoding a first picture of afirst view of a scene to produce an encoded picture with a firstresolution, encoding at least a portion of a second picture of a secondview of the scene relative to a reference picture of the first view toproduce an encoded picture with a reduced resolution relative to thefirst resolution, and outputting the encoded first resolution pictureand the encoded reduced resolution picture in a common bitstream.

In another example, an apparatus includes a video encoder configured toencode a first picture of a first view of a scene to produce an encodedpicture with a first resolution, encode at least a portion of a secondpicture of a second view of the scene relative to a reference picture ofthe first view to produce an encoded picture with a reduced resolutionrelative to the first resolution, and output the encoded firstresolution picture and the encoded reduced resolution picture in acommon bitstream.

In another example, an apparatus includes means for encoding a firstpicture of a first view of a scene to produce an encoded picture with afirst resolution, means for encoding at least a portion of a secondpicture of a second view of the scene relative to a reference picture ofthe first view to produce an encoded picture with a reduced resolutionrelative to the first resolution, and means for outputting the encodedfirst resolution picture and the encoded reduced resolution picture in acommon bitstream.

In another example, a computer program product includes acomputer-readable storage medium having stored thereon instructionsthat, when executed, cause a processor to encode a first picture of afirst view of a scene to produce an encoded picture with a firstresolution, encode at least a portion of a second picture of a secondview of the scene relative to a reference picture of the first view toproduce an encoded picture with a reduced resolution relative to thefirst resolution, and output the encoded first resolution picture andthe encoded reduced resolution picture in a common bitstream.

In another example, a method includes receiving, from a commonbitstream, a first resolution encoded picture of a first view of a sceneand a reduced resolution encoded picture of a second view of the scene,wherein the reduced resolution encoded picture has a reduced resolutionrelative to the first resolution, decoding the first resolution encodedpicture to produce a first decoded picture, decoding at least a portionof the reduced resolution encoded picture relative to a referencepicture of the first view, upsampling the reduced resolution picture toproduce a second decoded picture of the scene with the first resolution,and outputting the first decoded picture and the second decoded picture,wherein the first decoded picture and the second decoded picture form astereo image pair.

In another example, an apparatus includes a video decoder configured toreceive, from a common bitstream, a first resolution encoded picture ofa first view of a scene and a reduced resolution encoded picture of asecond view of the scene, wherein the reduced resolution encoded picturehas a reduced resolution relative to the first resolution, decode thefirst resolution encoded picture to produce a first decoded picture,decode at least a portion of the reduced resolution encoded picturerelative to a reference picture of the first view, upsample the reducedresolution picture to produce a second decoded picture of the scene withthe first resolution, and output the first decoded picture and thesecond decoded picture, wherein the first decoded picture and the seconddecoded picture form a stereo image pair.

In another example, an apparatus includes means for receiving, from acommon bitstream, a first resolution encoded picture of a first view ofa scene and a reduced resolution encoded picture of a second view of thescene, wherein the reduced resolution encoded picture has a reducedresolution relative to the first resolution, means for decoding thefirst resolution encoded picture to produce a first decoded picture,means for decoding at least a portion of the reduced resolution encodedpicture relative to a reference picture of the first view, means forupsampling the reduced resolution picture to produce a second decodedpicture of the scene with the first resolution, and means for outputtingthe first decoded picture and the second decoded picture, wherein thefirst decoded picture and the second decoded picture form a stereo imagepair.

In another example, a computer program product includes acomputer-readable storage medium having stored thereon instructionsthat, when executed, cause a processor to receive, from a commonbitstream, a first resolution encoded picture of a first view of a sceneand a reduced resolution encoded picture of a second view of the scene,decode the first resolution encoded picture to produce a first decodedpicture, decode at least a portion of the reduced resolution encodedpicture relative to a reference picture of the first view, upsample thereduced resolution picture to produce a second decoded picture of thescene with the first resolution, and output the first decoded pictureand the second decoded picture, wherein the first decoded picture andthe second decoded picture form a stereo image pair.

The details of one or more examples are set forth in the accompanyingdrawings and the description below. Other features, objects, andadvantages will be apparent from the description and drawings, and fromthe claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system that may utilize techniques for forming asymmetricpacked frames including pictures from two corresponding views of ascene.

FIG. 2 is a block diagram illustrating an example of a video encoderthat may implement techniques for producing asymmetric packed frames.

FIG. 3 is a block diagram illustrating an example of a video decoder,which decodes an encoded video sequence.

FIG. 4 is a conceptual diagram illustrating pictures of a left eye viewand a right eye view being combined by a video encoder to form anasymmetric packed frame having a top-bottom frame packing arrangement.

FIG. 5 is a conceptual diagram illustrating pictures of a left eye viewand a right eye view being combined by a video encoder to form anasymmetric packed frame having a side-by-side frame packing arrangement.

FIG. 6 is a conceptual diagram illustrating an example process forforming an asymmetric packed frame including a reduced resolutionpicture encoded as a field.

FIG. 7 is a conceptual diagram illustrating field encoding of a pictureto produce a reduced resolution encoded picture for inclusion in anasymmetric packed frame.

FIG. 8 is a conceptual diagram illustrating inter-view prediction of ablock of a reduced resolution encoded picture of an asymmetric packedframe.

FIG. 9 is a flowchart illustrating an example method for encoding twopictures of two different views and combining the pictures to form anasymmetric packed frame.

FIG. 10 is a flowchart illustrating an example method for decoding anasymmetric frame.

FIG. 11 is a flowchart illustrating an example method for performingframe field interleaved coding in accordance with the techniques of thisdisclosure.

FIG. 12 is a flowchart illustrating an example method for decoding aframe field interleaved coded bitstream in accordance with thetechniques of this disclosure.

DETAILED DESCRIPTION

In general, this disclosure relates to techniques for supporting stereovideo data, e.g., video data used to produce a three-dimensional effect.To produce a three-dimensional effect in video, two views of a scene,e.g., a left eye view and a right eye view, are shown simultaneously ornearly simultaneously. Two pictures of the same scene, corresponding tothe left eye view and the right eye view of the scene, may be capturedfrom slightly different horizontal positions, representing thehorizontal offset between a viewer's left and right eyes. By displayingthese two pictures simultaneously or nearly simultaneously, such thatthe left eye view picture is perceived by the viewer's left eye and theright eye view picture is perceived by the viewer's right eye, theviewer may experience a three-dimensional video effect.

This disclosure provides techniques for forming a bitstream includingpacked frames. A packed frame may correspond to a single frame of videodata having data for two pictures corresponding to different views of ascene. In particular, the techniques of this disclosure include encodinga packed frame having a full resolution picture of one view of a sceneand a reduced resolution picture of another view of the scene. A packedframe including a full resolution picture of a first view of a scene anda reduced resolution picture of a second, different view of the scenemay be referred to as an asymmetric packed frame, or simply anasymmetric frame.

In general, the terms “picture” and “frame” may be used interchangeably.This disclosure generally refers to a picture as a sample of a view.This disclosure generally refers to a frame as comprising one or morepictures, which is to be coded as an access unit representing a specifictime instance. Accordingly, a frame may correspond to a sample of a view(that is, a single picture) or, in the case of packed frames, includesamples from multiple views (that is, two or more pictures).

As an example, two view pictures may be packed as a frame with atop-bottom format. In this example, one view picture may be arranged ontop of the other. Each picture may have the same width of w pixels. Thefull resolution picture may have a height of h pixels, while the reducedresolution picture may have a height of h/2 pixels. As another example,two view pictures may be packed as a frame with a side-by-side format.In this example, the two view pictures may be arranged beside eachother. Each picture may have the same height of h pixels. The fullresolution picture may have a width of w pixels, while the reducedresolution picture may have a width of w/2 pixels.

Forming asymmetric frames in this manner may provide several advantages.For example, the same bitstream may be sent to devices configured topresent three-dimensional video data and to devices that are limited toonly two-dimensional video data. The three-dimensional video capabledevices may separate the asymmetric frames into constituent views,upsample the reduced resolution view, and display the two viewssimultaneously or near simultaneously. The two-dimensional video capabledevices may remove the reduced resolution view and display only the fullresolution view. In this manner, a video content provider, e.g., anetwork-based server or broadcaster, need only form one bitstream, anddevices with varying capabilities may each receive the same bitstream.Moreover, the bitstream may require less bandwidth than a bitstreamcomprising full resolution pictures of each of two or more views, whileintroducing negligible subjective quality degradation.

Accordingly, the techniques of this disclosure may support backwardscompatibility with legacy devices that are not capable of presentingthree-dimensional video data. Unlike devices that can receive and decodesymmetric packed frames, which include two sub-sampled pictures, devicesreceiving asymmetric packed frames in accordance with the techniques ofthis disclosure may receive a full resolution picture and a reducedresolution picture. Accordingly, the devices need not upsample a picturejust to produce a two-dimensional video presentation. Furthermore, abitstream in accordance with the techniques of this disclosure (e.g.,including asymmetric packed frames) may consume less bandwidth than abitstream having two full resolution pictures for three-dimensionalvideo data.

In some examples, the reduced resolution frame may be encoded withrespect to a frame of the other view. That is, an encoder may performinter-view prediction for reduced resolution pictures of asymmetricpacked frames. This disclosure describes techniques for encoding thereduced resolution pictures as fields and using displacement vectors tointer-view encode the reduced resolution pictures. In this manner, thisdisclosure also provides techniques for performing inter-view predictionfor a reduced resolution picture of an asymmetric packed frame. Thisdisclosure further provides frame field interleaved coding techniques,in which pictures of one view may be coded as frames, while pictures ofanother view may be coded as fields, and the frame pictures and fieldpictures of the two views may be interleaved in a common bitstream. Thepictures of each view may form discrete, independent access units of thesame bitstream.

This disclosure also provides techniques for signaling a frame packingtype at the network abstraction layer (NAL), e.g., in supplementalenhancement information (SEI) messages of NAL units. Network abstractionlayer (NAL) units may include and/or describe coded audio and videodata, e.g., using SEI messages. In the example of H.264/AVC (AdvancedVideo Coding), coded video segments are organized into NAL units, whichprovide a “network-friendly” video representation addressingapplications such as video telephony, storage, broadcast, or streaming.NAL units can be categorized as Video Coding Layer (VCL) NAL units andnon-VCL NAL units. VCL units may contain output from the corecompression engine and may include block, macroblock, and/or slice leveldata. Other NAL units may be non-VCL NAL units. In some examples, acoded picture in one time instance, normally presented as a primarycoded picture, may be contained in an access unit, which may include oneor more NAL units.

In some examples, the techniques of this disclosure may be applied toH.264/AVC codecs or codecs based on advanced video coding (AVC), such asscalable video encoding (SVC), multiview video coding (MVC), or otherextensions of H.264/AVC. Such codecs may be configured to recognize SEImessages when the SEI messages are associated with an access unit, wherethe SEI message may be encapsulated within the access unit in an ISObase media file format or MPEG-2 Systems bitstream. The techniques mayalso be applied to future coding standards, e.g., H.265/HEVC (highefficiency video coding).

SEI messages may contain information that is not necessary for decodingthe coded pictures samples from VCL NAL units, but may assist inprocesses related to decoding, display, error resilience, and otherpurposes. SEI messages may be contained in non-VCL NAL units. SEImessages are the normative part of some standard specifications, andthus are not always mandatory for standard compliant decoderimplementation. SEI messages may be sequence level SEI messages orpicture level SEI messages. Some sequence level information may becontained in SEI messages, such as scalability information SEI messagesin the example of SVC and view scalability information SEI messages inMVC. These example SEI messages may convey information on, e.g.,extraction of operation points and characteristics of the operationpoints.

H.264/AVC provides a frame packing SEI message, which is a codec-levelmessage indicating a frame packing type for a frame including a twopictures, e.g., a left view and a right view of a scene. In this manner,H.264/AVC supports interleaving of two pictures of left view and rightview into one picture and coding such pictures into a video sequence.The frame packing SEI message is described in “Informationtechnology—Coding of audio-visual objects—Part 10: Advanced VideoCoding, AMENDMENT 1: Constrained baseline profile, stereo high profileand frame packing arrangement SEI message,” N10703, MPEG of ISO/IECJTC1/SC29/WG11, Xian, China, October 2009, which is incorporated intothe most recent version of the H.264/AVC standard.

In this SEI message, various types of frame packing methods aresupported for spatial interleaving of two frames. The supportedinterleaving methods include checkerboard, column interleaving, rowinterleaving, side-by-side, top-bottom, and side-by-side withcheckerboard upconversion. This disclosure provides techniques forsupporting additional frame packing types, such as asymmetric framepacking arrangements. In particular, this disclosure provides a modifiedframe packing SEI message that indicates whether asymmetric packing isenabled for a particular frame, and if so, whether the asymmetric frameis packed top-bottom or side-by-side. For example, the frame packing SEImessage may indicate whether the pictures for the two views in the sameframe are arranged with the reduced resolution picture below the fullresolution picture or to the right of the full resolution picture in theframe. A decoder may use this information to determine whether the frameis an asymmetric frame and to properly separate the asymmetric frameinto constituent pictures of the two views.

This disclosure includes techniques for signaling whether a frame is anasymmetric packed frame in an SEI message, in some examples, e.g., withrespect to H.264/AVC. As one example, an encoder may signal that a frameis an asymmetric packed frame in an independent SEI message. As anotherexample, an encoder may signal that a frame is an asymmetric packedframe in a modified version of the frame packing arrangement SEImessage. The encoder may also signal, in video usability information(VUI), an aspect ratio for the asymmetric packed frame to indicate apacking arrangement for the asymmetric packed frame. For example, theencoder may signal an aspect ratio of 4:3 (or one of the unspecifiedvalues of Table E-1 of the H.264/AVC specification) to indicate aside-by-side packing arrangement. As another example, the encoder maysignal an aspect ratio of 3:4 (or, again, one of the unspecified valuesof Table E-1 of the H.264/AVC specification) to indicate a top-bottompacking arrangement.

It should be understood that methods for sub-sampling and up-sampling ofthe reduced resolution picture are not limited to any particulartechniques. For purposes of example, this disclosure generally describeshorizontal or vertical down-sampling and upsampling. However, quincunx(that is, checkerboard) sampling may also be used.

In addition, this disclosure provides techniques for transferring abitstream including asymmetric packed frames over a high definitionmultimedia interface (HDMI). In this manner, this disclosure providestechniques by which a three- dimensional video interface, such as HDMI,may accept view images with asymmetric packing in one or more frames.

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system 10 that may utilize techniques for forming asymmetricpacked frames including pictures from two corresponding views of ascene. As shown in FIG. 1, system 10 includes a source device 12 thattransmits encoded video to a destination device 14 via a communicationchannel 16. Source device 12 and destination device 14 may comprise anyof a wide range of devices, such as fixed or mobile computing devices,set-top boxes, gaming consoles, digital media players, or the like. Insome cases, source device 12 and destination device 14 may comprisewireless communication devices, such as wireless handsets, so-calledcellular or satellite radiotelephones, or any wireless devices that cancommunicate video information over a communication channel 16, in whichcase communication channel 16 is wireless.

The techniques of this disclosure, however, which concern formingasymmetric packed frames, are not necessarily limited to wirelessapplications or settings. For example, these techniques may apply toover-the-air television broadcasts, cable television transmissions,satellite television transmissions, Internet video transmissions,encoded digital video that is encoded onto a storage medium, or otherscenarios. Accordingly, communication channel 16 may comprise anycombination of wireless or wired media suitable for transmission ofencoded video data.

In the example of FIG. 1, source device 12 includes a video source 18,video encoder 20, a modulator/demodulator (modem) 22 and a transmitter24. Destination device 14 includes a receiver 26, a modem 28, a videodecoder 30, and a display device 32. In accordance with this disclosure,video encoder 20 of source device 12 may be configured to apply thetechniques for forming a bitstream including asymmetric packed frames,e.g., frames including coded data for two pictures, each from adifferent view of a scene, where one of the pictures has full resolutionand the other picture has a reduced resolution, e.g., one-half of theresolution of the full resolution frame. Moreover, video encoder 20 maybe configured to inter-view encode the reduced resolution frame. Inother examples, a source device and a destination device may includeother components or arrangements. For example, source device 12 mayreceive video data from an external video source 18, such as an externalcamera. Likewise, destination device 14 may interface with an externaldisplay device, rather than including an integrated display device.

The illustrated system 10 of FIG. 1 is merely one example. Techniquesfor producing asymmetric packed frames and splitting asymmetric packedframes into constituent views may be performed by any digital videoencoding and/or decoding device. Although generally the techniques ofthis disclosure are performed by a video encoding device, the techniquesmay also be performed by a video encoder/decoder, typically referred toas a “CODEC.” Moreover, aspects of the techniques of this disclosure mayalso be performed by a video preprocessor or video postprocessor, suchas a file encapsulation unit, file decapsulation unit, videomultiplexer, or video demultiplexer. Source device 12 and destinationdevice 14 are merely examples of such coding devices in which sourcedevice 12 generates coded video data for transmission to destinationdevice 14. In some examples, devices 12, 14 may operate in asubstantially symmetrical manner such that each of devices 12, 14include video encoding and decoding components. Hence, system 10 maysupport one-way or two-way video transmission between video devices 12,14, e.g., for video streaming, video playback, video broadcasting, videogaming, or video telephony.

Video source 18 of source device 12 may include a video capture device,such as a video camera, a video archive containing previously capturedvideo, and/or a video feed from a video content provider. As a furtheralternative, video source 18 may generate computer graphics-based dataas the source video, or a combination of live video, archived video, andcomputer-generated video. In some cases, if video source 18 is a videocamera, source device 12 and destination device 14 may form so-calledcamera phones or video phones. As mentioned above, however, thetechniques described in this disclosure may be applicable to videocoding in general, and may be applied to wireless and/or wiredapplications executed by mobile or generally non-mobile computingdevices. In any case, the captured, pre-captured, or computer-generatedvideo may be encoded by video encoder 20.

Video source 18 may provide pictures from two or more views to videoencoder 20. Two pictures of the same scene may be capturedsimultaneously or nearly simultaneously from slightly differenthorizontal positions, such that the two pictures can be used to producea three-dimensional effect. Alternatively, video source 18 (or anotherunit of source device 12) may use depth information or disparityinformation to generate a second picture of a second view from a firstpicture of a first view. The depth or disparity information may bedetermined by a camera capturing the first view, or may be calculatedfrom data in the first view.

MPEG-C part-3 provides a specified format for including a depth map fora picture in a video stream. The specification is described in “Text ofISO/IEC FDIS 23002-3 Representation of Auxiliary Video and SupplementalInformation,” ISO/IEC JTC 1/SC 29/WG 11, MPEG Doc, N8768, Marrakech,Morocoo, January 2007. In MPEG-C part 3, auxiliary video can be a depthmap or a parallax map. When representing a depth map, MPEG-C part-3 mayprovide flexibilities, in terms of number of bits used to represent eachdepth value and resolution of depth map. For example, the map may beone-quarter of the width and one-half of the height of the imagedescribed by the map. The map may be coded as a monochromatic videosample, e.g., within an H.264/AVC bitstream with only the luminancecomponent. Alternatively, the map may be coded as auxiliary video data,as defined in H.264/AVC. In the context of this disclosure, a depth mapor a parallax map may have the same resolution as the primary videodata. Although the H.264/AVC specification does not currently specifythe usage of auxiliary video data to code depth map the techniques ofthis disclosure may be used in conjunction with techniques for usingsuch a depth map or parallax map.

The encoded video information may then be modulated by modem 22according to a communication standard, and transmitted to destinationdevice 14 via transmitter 24. Modem 22 may include various mixers,filters, amplifiers or other components designed for signal modulation.Transmitter 24 may include circuits designed for transmitting data,including amplifiers, filters, and one or more antennas.

Receiver 26 of destination device 14 receives information over channel16, and modem 28 demodulates the information. Again, the video encodingprocess may implement one or more of the techniques described herein toform an asymmetric packed frame having a full resolution picture of oneview and a reduced resolution picture of another view.

The information communicated over channel 16 may include syntaxinformation defined by video encoder 20, which is also used by videodecoder 30, that includes syntax elements that describe characteristicsand/or processing of macroblocks and other coded units, e.g., GOPs.Accordingly, video decoder 30 may unpack the asymmetric packed frameinto constituent pictures of the views, decode the pictures, andupsample the reduced resolution picture to the full resolution. Displaydevice 32 may display the decoded pictures to a user.

Display device 32 may comprise any of a variety of display devices suchas a cathode ray tube (CRT), a liquid crystal display (LCD), a plasmadisplay, an organic light emitting diode (OLED) display, or another typeof display device. Display device 32 may display the two pictures fromthe asymmetric packed frame simultaneously or nearly simultaneously. Forexample, display device 32 may comprise a stereoscopic three-dimensionaldisplay device capable of displaying two views simultaneously or nearlysimultaneously.

A user may wear active glasses to rapidly and alternatively shutter leftand right lenses, such that display device 32 may rapidly switch betweenthe left and the right view in synchronization with the active glasses.Alternatively, display device 32 may display the two viewssimultaneously, and the user may wear passive glasses (e.g., withpolarized lenses) which filter the views to cause the proper views topass through to the user's eyes. As still another example, displaydevice 32 may comprise an autostereoscopic display, for which no glassesare needed.

In some examples, modem 28 and video decoder 30 may be included inseparate devices. The separate devices may be coupled by a highdefinition multimedia interface (HDMI). This disclosure, in someexamples, proposes modifying HDMI to support transfer of asymmetricpacked frames. HDMI provides three-dimensional video formats in AppendixH of version 1.4 of the HDMI specification, which is available at. Thisspecification supports various formats for packing three-dimensionalvideo data into one frame, e.g., in the 3D_Structure field. Inaccordance with the techniques of this disclosure, devices may exchangeasymmetric packed frames via HDMI, in addition to those packingarrangements already provided by HDMI version 1.4.

As an example, the 3D_Structure field may include a value indicatingthat a frame has a frame packing format, which is similar to atop-bottom arrangement in H.264/AVC, but without sub-sampling. There maybe some blank area in a frame having a frame packing format for HDMI. Asanother example, the 3D_Structure field may include a value indicatingthat a frame has a field alternative format, which indicates that aleft-view image and a right-view image are fields of the correspondingframe. As another example, the 3D_Structure field may include a valueindicating that a frame has a side-by-side full format, indicating thatthe views are arranged side-by-side and not sub-sampled.

As still another example, the 3D_Structure field may include a valueindicating that a frame has a side-by-side half format, indicating thatthe views are sub-sampled with a half horizontal resolution, and arearranged side-by-side. When the side-by-side half format is enabled,subsampling and position information may also be signaled, e.g., in a3D_Ext_Data field. The frame may support two types of sub-sampling:horizontal sub-sampling or quincunx (e.g., checkerboard) matrix. Theposition information may provide data indicating a phase shift of thesub-sampled left and right views. HDMI also supports texture image plusdepth image information, as well as video content with graphicsrepresentation.

As noted above, the techniques of this disclosure include modifying HDMIto support asymmetric packed frames. For example, in accordance withthis disclosure, a device may set a value for a 3D_Structure field ofHDMI data to indicate that a frame is an asymmetric packed frame. The3D_Structure field may include a value indicating that a frame includesa full resolution picture and a reduced resolution picture that form astereo pair, and indicating that the pictures are arranged side-by-sideor top-bottom.

In the example of FIG. 1, communication channel 16 may comprise anywireless or wired communication medium, such as a radio frequency (RF)spectrum or one or more physical transmission lines, or any combinationof wireless and wired media. Communication channel 16 may form part of apacket-based network, such as a local area network, a wide-area network,or a global network such as the Internet. Communication channel 16generally represents any suitable communication medium, or collection ofdifferent communication media, for transmitting video data from sourcedevice 12 to destination device 14, including any suitable combinationof wired or wireless media. Communication channel 16 may includerouters, switches, base stations, or any other equipment that may beuseful to facilitate communication from source device 12 to destinationdevice 14.

Video encoder 20 and video decoder 30 may operate according to a videocompression standard, such as the ITU-T H.264 standard, alternativelyreferred to as MPEG-4, Part 10, Advanced Video Coding (AVC). Thetechniques of this disclosure, however, are not limited to anyparticular coding standard. Other examples include MPEG-2 and ITU-TH.263. Although not shown in FIG. 1, in some aspects, video encoder 20and video decoder 30 may each be integrated with an audio encoder anddecoder, and may include appropriate MUX-DEMUX units, or other hardwareand software, to handle encoding of both audio and video in a commondata stream or separate data streams. If applicable, MUX-DEMUX units mayconform to the ITU H.223 multiplexer protocol, or other protocols suchas the user datagram protocol (UDP).

The ITU-T H.264/MPEG-4 (AVC) standard was formulated by the ITU-T VideoCoding Experts Group (VCEG) together with the ISO/IEC Moving PictureExperts Group (MPEG) as the product of a collective partnership known asthe Joint Video Team (JVT). In some aspects, the techniques described inthis disclosure may be applied to devices that generally conform to theH.264 standard. The H.264 standard is described in ITU-T RecommendationH.264, Advanced Video Coding for generic audiovisual services, by theITU-T Study Group, and dated March, 2005, which may be referred toherein as the H.264 standard or H.264 specification, or the H.264/AVCstandard or specification. The Joint Video Team (JVT) continues to workon extensions to H.264/MPEG-4 AVC.

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder circuitry, such as one or moremicroprocessors, digital signal processors (DSPs), application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs),discrete logic, software, hardware, firmware or any combinationsthereof. Each of video encoder 20 and video decoder 30 may be includedin one or more encoders or decoders, either of which may be integratedas part of a combined encoder/decoder (CODEC) in a respective camera,computer, mobile device, subscriber device, broadcast device, set-topbox, server, or the like.

A video sequence typically includes a series of video frames. A group ofpictures (GOP) generally comprises a series of one or more video frames.A GOP may include syntax data in a header of the GOP, a header of one ormore frames of the GOP, or elsewhere, that describes a number of framesincluded in the GOP. Each frame may include frame syntax data thatdescribes an encoding mode for the respective frame. Video encoder 20typically operates on video blocks within individual video frames inorder to encode the video data. A video block may correspond to amacroblock or a partition of a macroblock. The video blocks may havefixed or varying sizes, and may differ in size according to a specifiedcoding standard. Each video frame may include a plurality of slices.Each slice may include a plurality of macroblocks, which may be arrangedinto partitions, also referred to as sub-blocks.

As an example, the ITU-T H.264 standard supports intra prediction invarious block sizes, such as 16 by 16, 8 by 8, or 4 by 4 for lumacomponents, and 8×8 for chroma components, as well as inter predictionin various block sizes, such as 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4for luma components and corresponding scaled sizes for chromacomponents. In this disclosure, “N×N” and “N by N” may be usedinterchangeably to refer to the pixel dimensions of the block in termsof vertical and horizontal dimensions, e.g., 16×16 pixels or 16 by 16pixels. In general, a 16×16 block will have 16 pixels in a verticaldirection (y=16) and 16 pixels in a horizontal direction (x=16).Likewise, an N×N block generally has N pixels in a vertical directionand N pixels in a horizontal direction, where N represents a nonnegativeinteger value. The pixels in a block may be arranged in rows andcolumns. Moreover, blocks need not necessarily have the same number ofpixels in the horizontal direction as in the vertical direction. Forexample, blocks may comprise N×M pixels, where M is not necessarilyequal to N.

Block sizes that are less than 16 by 16 may be referred to as partitionsof a 16 by 16 macroblock. Video blocks may comprise blocks of pixel datain the pixel domain, or blocks of transform coefficients in thetransform domain, e.g., following application of a transform such as adiscrete cosine transform (DCT), an integer transform, a wavelettransform, or a conceptually similar transform to residual video blockdata representing pixel differences between coded video blocks andpredictive video blocks. In some cases, a video block may compriseblocks of quantized transform coefficients in the transform domain.

Smaller video blocks can provide better resolution, and may be used forlocations of a video frame that include high levels of detail. Ingeneral, macroblocks and the various partitions, sometimes referred toas sub-blocks, may be considered video blocks. In addition, a slice maybe considered to be a plurality of video blocks, such as macroblocksand/or sub-blocks. Each slice may be an independently decodable unit ofa video frame. Alternatively, frames themselves may be decodable units,or other portions of a frame may be defined as decodable units. The term“coded unit” or “coding unit” may refer to any independently decodableunit of a video frame such as an entire frame, a slice of a frame, agroup of pictures (GOP) also referred to as a sequence, or anotherindependently decodable unit defined according to applicable codingtechniques.

In accordance with the techniques of this disclosure, video encoder 20may form asymmetric packed frames from received video data of two views.That is, video encoder 20 may receive raw image data of two views from,e.g., video source 18. In general, the two views may include a sequenceof pictures, such that for each picture of one view, there exists apicture of the other view that forms a stereo pair with the picture ofthe first view. A stereo pair generally corresponds to two picturesthat, when displayed simultaneously or nearly simultaneously, produce athree-dimensional video effect. Pictures that form a stereo pair mayinclude descriptive data, such as timestamps, to indicate acorresponding picture of another view with which a current picture formsa stereo pair.

In any case, video encoder 20 may encode a picture of a first viewnormally, e.g., in accordance with ITU-T H.264/AVC encoding standards orwith another encoding standard such as MPEG-2, MPEG-4, H.265, or thelike. Video encoder 20, or a video preprocessing unit of source device12 (which may comprise a processor, processing unit, ASIC, DSP, FPGA, orother processing circuitry coupled between video source 18 and videoencoder 20), may spatially downsample a picture of a second view thatforms a stereo pair with the encoded picture of the first view. Spatialdownsampling may comprise reducing spatial resolutions, e.g., byreducing vertical and/or horizontal pixel resolution. In one example,video encoder 20 may reduce the vertical pixel resolution of the pictureby one-half.

Video encoder 20 may then encode the reduced resolution picture of theother view. In some examples, video encoder 20 may encode the reducedresolution picture in an intra-prediction mode (e.g., as an I-Picture)or in an inter-prediction mode (e.g., as a P-Picture or a B-Picture). Inthis manner, video encoder 20 may encode the reduced resolution picturerelative to other pictures in the same view that occur earlier (indecoding-time order) in a bitstream produced by video encoder 20. Insome examples, video encoder 20 may implement inter-view prediction, inwhich video encoder 20 may encode the reduced resolution view relativeto pictures of the view including the full resolution picture. Forexample, video encoder 20 may encode the reduced resolution view picturerelative to previously encoded pictures of the view including the fullresolution encoded picture. Video encoder 20 may encode the reducedresolution view picture relative to the full resolution picture of thesame packed frame, or of previously coded frames.

As one example, video encoder 20 may encode the reduced resolutionpicture as a field. Techniques for interlaced video data coding may beemployed to encode the reduced resolution picture as a field, in whichcase horizontal rows of pixels of the reduced resolution picture may bepredicted from alternate rows of pixels of a full resolution picture.That is, video encoder 20 may encode the reduced resolution picture aseither a top field or a bottom field. In some examples, video encoder 20may output the full resolution picture of one view as an access unit anda corresponding reduced resolution picture of a different view as aseparate access unit. Thus, video encoder 20 need not necessarilycombine the two pictures into an asymmetric frame to perform techniquesfor combining data for two views into a single, common bitstream.

As another example, video encoder 20 may encode the reduced resolutionpicture using displacement vectors. The displacement vectors may berelative to reduced resolution pictures in the same view or fullresolution pictures in the view including full resolution pictures. Whenthe displacement vector refers to a full resolution picture, videoencoder 20 may account for the position of the reduced resolutionpicture in the asymmetric frame. Suppose, for example, that theasymmetric packed frame includes the pictures in a top-bottomarrangement with the reduced resolution picture below the fullresolution picture in the frame. Video encoder 20 may modify a verticalcomponent of the displacement vector by subtracting the height of thefull resolution picture from the vertical component and multiplying theresulting difference by two, assuming that the reduced resolutionpicture has one-half the resolution of the full resolution picture.

Following intra-predictive or inter-predictive coding to producepredictive data and residual data, and following any transforms (such asthe 4×4 or 8×8 integer transform used in H.264/AVC or a discrete cosinetransform DCT) applied to residual data to produce transformcoefficients, quantization of transform coefficients may be performed.Quantization generally refers to a process in which transformcoefficients are quantized to possibly reduce the amount of data used torepresent the coefficients. The quantization process may reduce the bitdepth associated with some or all of the coefficients. For example, ann-bit value may be rounded down to an m-bit value during quantization,where n is greater than m.

Following quantization, entropy coding of the quantized data may beperformed, e.g., according to content adaptive variable length coding(CAVLC), context adaptive binary arithmetic coding (CABAC), or anotherentropy coding methodology. A processing unit configured for entropycoding, or another processing unit, may perform other processingfunctions, such as zero run length coding of quantized coefficientsand/or generation of syntax information such as coded block pattern(CBP) values, macroblock type, coding mode, maximum macroblock size fora coded unit (such as a frame, slice, macroblock, or sequence), or thelike.

Video encoder 20 may further send syntax data, such as block-basedsyntax data, frame-based syntax data, and/or GOP-based syntax data, tovideo decoder 30, e.g., in a frame header, a block header, a sliceheader, or a GOP header. The GOP syntax data may describe a number offrames in the respective GOP, and the frame syntax data may indicate anencoding/prediction mode used to encode the corresponding frame. Videodecoder 30 may therefore comprise a standard video decoder and need notnecessarily be specially configured to effect or utilize the techniquesof this disclosure.

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder or decoder circuitry, as applicable, suchas one or more microprocessors, digital signal processors (DSPs),application specific integrated circuits (ASICs), field programmablegate arrays (FPGAs), discrete logic circuitry, software, hardware,firmware or any combinations thereof. Each of video encoder 20 and videodecoder 30 may be included in one or more encoders or decoders, eitherof which may be integrated as part of a combined video encoder/decoder(CODEC). An apparatus including video encoder 20 and/or video decoder 30may comprise an integrated circuit, a microprocessor, a computingdevice, and/or a wireless communication device, such as a mobiletelephone.

Video decoder 30 may be configured to receive a bitstream includingasymmetric packed frames. Video decoder 30 may further be configured tounpack such a frame into corresponding pictures, e.g., a full resolutionpicture of one view and a reduced resolution picture of another view.Video decoder 30 may decode the pictures and upsample (e.g., throughinterpolation) the reduced resolution picture to produce two decoded,full resolution pictures. In some examples, video decoder 30 may decodethe reduced resolution picture with reference to a decoded picture fromthe view corresponding to the full resolution picture. That is, videodecoder 30 may also support inter-view prediction.

In some examples, video decoder 30 may be configured to determinewhether destination device 14 is capable of decoding and displayingthree-dimensional data. If not, video decoder 30 may unpack a receivedasymmetric packed frame, but discard the reduced resolution picture.Video decoder 30 may decode the full resolution picture and otherpictures of the same view, and cause video display 32 to display thepictures from this view to present two-dimensional video data. Thus,video decoder 30 may decode the full resolution picture and provide thedecoded full resolution picture to display device 32, without attemptingto decode the reduced resolution picture.

In this manner, whether or not destination device 14 is capable ofdisplaying three-dimensional video data, destination device 14 mayreceive a bitstream including asymmetric packed frames. Thus, variousdestination devices with various decoding and rendering capabilities maybe configured to receive the same bitstream from source device 20. Thatis, some destination devices may be capable of decoding and renderingthree-dimensional video data while others may not be capable of decodingand/or rendering three-dimensional video data, yet each of the devicesmay be configured to receive and use data from the same bitstreamincluding asymmetric packed frames.

FIG. 2 is a block diagram illustrating an example of video encoder 20that may implement techniques for producing asymmetric packed frames.Video encoder 20 may perform intra- and inter-coding of blocks withinvideo frames, including macroblocks, or partitions or sub-partitions ofmacroblocks. Intra-coding relies on spatial prediction to reduce orremove spatial redundancy in video within a given video frame.Inter-coding relies on temporal prediction to reduce or remove temporalredundancy in video within adjacent frames of a video sequence.Intra-mode (I-mode) may refer to any of several spatial basedcompression modes and inter-modes such as uni-directional prediction(P-mode) or bi-directional prediction (B-mode) may refer to any ofseveral temporal-based compression modes. Video encoder 20 may also, insome examples, be configured to perform inter-view prediction of reducedresolution pictures in an asymmetric packed frame.

As shown in FIG. 2, video encoder 20 receives a current video blockwithin a video picture to be encoded. In the example of FIG. 2, videoencoder 20 includes motion compensation unit 44, motion estimation unit42, reference frame store 64, summer 50, transform unit 52, quantizationunit 54, and entropy coding unit 56. For video block reconstruction,video encoder 20 also includes inverse quantization unit 58, inversetransform unit 60, and summer 62. A deblocking filter (not shown in FIG.2) may also be included to filter block boundaries to remove blockinessartifacts from reconstructed video. If desired, the deblocking filterwould typically filter the output of summer 62.

During the encoding process, video encoder 20 receives a video pictureor slice to be coded. The picture or slice may be divided into multiplevideo blocks. Motion estimation unit 42 and motion compensation unit 44perform inter-predictive coding of the received video block relative toone or more blocks in one or more reference frames to provide temporalcompression. Intra prediction unit 46 may perform intra-predictivecoding of the received video block relative to one or more neighboringblocks in the same frame or slice as the block to be coded to providespatial compression. Mode select unit 40 may select one of the codingmodes, intra or inter, e.g., based on error results, and provides theresulting intra- or inter-coded block to summer 50 to generate residualblock data and to summer 62 to reconstruct the encoded block for use ina reference frame.

In particular, video encoder 20 may receive pictures from two viewsforming a stereo view pair. The two views may be referred to as view 0and view 1. Without loss of generality, assume that view 0 is a left eyeview and view 1 is a right eye view. It should be understood that theviews may be labeled differently, and that instead, view 1 maycorrespond to the left eye view and view 0 may correspond to the righteye view. In one example, video encoder 20 may encode pictures of view 0at a full resolution and pictures of view 1 at a reduced resolution.Video encoder 20 may downsample pictures of view 1 by a factor ofone-half in the horizontal or the vertical direction.

Video encoder 20 may further pack the encoded pictures into anasymmetric packed frame. Assume, for example, that video encoder 20receives a view 0 picture and a view 1 picture, each having a height ofh pixels and a width of w pixels, where w and h are non-negative,non-zero integers. Video encoder 20 may form a top-bottom arrangedasymmetric packed frame by downsampling the height of the view 1picture. For example, following downsampling and encoding of the view 1picture, the encoded, downsampled view 1 picture may have a height ofh/2 pixels and a width of w pixels. Video encoder 20 may then form anasymmetric packed frame including the encoded view 0 picture and theencoded, downsampled view 1 picture below the encoded view 0 picture,such that the asymmetric frame has a height of 3/2 h pixels and a widthof w pixels.

As another example, video encoder 20 may form a side-by-side arrangedasymmetric packed frame by downsampling the view 1 picture. For example,following downsampling and encoding of the view 1 picture, the view 1picture may have a width of w/2 pixels and a height of h pixels. Videoencoder 20 may then form an asymmetric packed frame including theencoded view 0 picture and the encoded, downsampled view 1 picture tothe right of the encoded view 0 picture, such that the asymmetric framehas a height of h pixels and a width of 3/2 w pixels.

Video encoder 20 may further provide information indicating a packingarrangement for an asymmetric packed frame. The information may indicatewhether the frame is an asymmetric packed frame, and if so, whether thepacking arrangement is side-by-side or top-bottom. As one example, videoencoder 20 may provide this information in the form of a frame packingarrangement SEI message. The frame packing arrangement SEI message maybe defined according to the example data structure of Table 1, below:

TABLE 1 frame_packing_arrangement SEI message De- scrip- C torframe_packing_arrangement( payloadSize ) {  frame_packing_arrangement_id5 ue(v)  frame_packing_arrangement_cancel_flag 5 u(1)  if(!frame_packing_arrangement_cancel_flag ) {    asymmetric_packing_idc 5u(2)    frame_packing_arrangement_type 5 u(5)    quincunx_sampling_flag5 u(1)    content_interpretation_type 5 u(6)    spatial_flipping_flag 5u(1)    frame0_flipped_flag 5 u(1)    field_views_flag 5 u(1)   current_frame_is_frame0_flag 5 u(1)    frame0_self_contained_flag 5u(1)    frame1_self_contained_flag 5 u(1)   If(!quincunx_sampling_flag          &&      frame_packing_arrangement_type !=5 ) {     frame0_grid_position_x5 u(4)     frame0_grid_position_y 5 u(4)     frame1_grid_position_x 5u(4)     frame1_grid_position_y 5 u(4)    }   frame_packing_arrangement_reserved_byte 5 u(8)   frame_packing_arrangement_repetition_period 5 ue(v)  } frame_packing_arrangement_extension_flag 5 u(1) }

The frame packing arrangement SEI message may inform a video decoder,such as video decoder 30, that the output decoded picture containssamples of a frame consisting of multiple distinct spatially packedconstituent frames using an indicated frame packing arrangement scheme.In accordance with the techniques of this disclosure, the frame maycomprise an asymmetric packed frame. The information of the SEI messagecan be used by the decoder to rearrange the samples and process thesamples of the constituent frames appropriately for display or otherpurposes. This SEI message may be associated with pictures that areeither frames or fields. The frame packing arrangement of the samplesmay be specified in terms of the sampling structure of a frame in orderto define a frame packing arrangement structure that is invariant withrespect to whether a picture is a single field of such a packed frame oris a complete packed frame.

Video encoder 20 may set frame_packing_arrangement_id to a valuecontaining an identifying number that may be used to identify the usageof the frame packing arrangement SEI message. Video encoder 20 may setvalue of frame_packing_arrangement_id in the range of 0 to 2³²−2,inclusive. Values of frame_packing_arrangement_id from 0 to 255 and from512 to 2³¹−1 may be used as determined by video encoder 20. Values offrame_packing_arrangement_id from 256 to 511 and from 2³¹ to 2³²−2 maybe reserved for future use by ITU-T | ISO/IEC. Video decoders may ignore(e.g., remove from the bitstream and discard) all frame packingarrangement SEI messages containing a value offrame_packing_arrangement_id in the range of 256 to 511 or in the rangeof 2³¹ to 2³²−2.

Video encoder 20 may set the value offrame_packing_arrangement_cancel_flag equal to 1 to indicate that theframe packing arrangement SEI message cancels the persistence of anyprevious frame packing arrangement SEI message in output order. Videoencoder 20 may set the value of frame_packing_arrangement_cancel_flagequal to 0 to indicate that frame packing arrangement informationfollows.

Video encoder 20 may set the value of asymmetric_packing_idc (asymmetricpacking indicator) to indicate a type of asymmetric coding. For example,video encoder 20 may set asymmetric_packing_idc to a value of 0 toindicate that two constituent frames have the same resolution, that is,that the corresponding frame is not an asymmetric packed frame. Videoencoder 20 may set the value of asymmetric_packing_idc larger than 0(e.g., 1 or 2) to indicate that two constituent frames have withdifferent resolutions. For example, one of the frames may be one-half ofthe other.

In one example, video encoder 20 may set the value ofasymmetric_packing_idc equal to 1 to indicate that two constituentframes have different resolutions, and that frame 1 has a halfresolution of frame 0. In one example, video encoder 20 may set thevalue of asymmetric_packing_idc equal to 2 to indicate that twoconstituent frames have different resolutions, and that frame 0 has ahalf resolution of frame 1. The value 3 for asymmetric_packing_idc iscurrently unspecified and reserved for future use. Table 2 belowprovides one example for interpreting the value ofasymmetric_packing_idc:

TABLE 2 asymmetric_packing_idc Value Example Interpretation 0 Frame 0and frame 1 have the same resolution 1 Indicates frame 1 being halfresolution of frame 0: when the frame_packing_arrangement_type is 3,frame 1 has the same height as frame 0 and frame 1 has a half width offrame 0; when the frame_packing_arrangement_type is 4, frame 1 has thesame width as frame 0 and frame 1 has a half height of frame 0. 2Indicates frame 0 being half resolution of frame 1: when theframe_packing_arrangement_type is 3, frame 0 has the same height asframe 1 and frame 0 has a half width of frame 1, when theframe_packing_arrangement_type is 4, frame 0 has the same width as frame1 and frame 0 has a half height of frame 1.

Video encoder 20 may set the value of frame_packing_arrangement_type toindicate the type of packing arrangement of the frames as specified inTable 3, below. When video encoder 20 sets the value ofasymmetric_packing_idc to a value larger than 0 (e.g., 1 or 2), videoencoder 20 may set the value of frame_packing_arrangement_type to either6, 7, 8, or 9.

TABLE 3 frame_packing_arrangement_type Value Example Interpretation 0Each component plane of the decoded frames contains a “checkerboard”based interleaving of corresponding planes of two constituent frames. 1Each component plane of the decoded frames contains a column basedinterleaving of corresponding planes of two constituent frames. 2 Eachcomponent plane of the decoded frames contains a row based interleavingof corresponding planes of two constituent frames. 3 Each componentplane of the decoded frames contains a side-by-side packing arrangementof corresponding planes of two constituent frames. 4 Each componentplane of the decoded frames contains top-bottom packing arrangement ofcorresponding planes of two constituent frames. 5 The component planesof the decoded frames in output order form a temporal interleaving ofalternating first and second constituent frames. 6, 7 Each componentplane of the decoded frames contains side-by-side packing arrangement ofcorresponding planes of two constituent frames as illustrated in FIG. 5,wherein only the bottom frame needs upconversion (in this example).Frame 0 and frame 1 have the same height. This value equal to 6indicates that frame 1 has a half width of frame 0; this value equal to7 indicates that frame 0 has a half width of frame 1. 8, 9 Eachcomponent plane of the decoded frames contains a top-bottom packingarrangement of corresponding planes of two constituent frames asillustrated in FIG. 4, wherein only the right frame needs upconversion(in this example). Frame 0 and frame 1 have the same width. This valueequal to 8 indicates that frame 1 has a half height of frame 0; thisvalue equal to 9 indicates that frame 0 has a half height of frame 1.

Video encoder 20 may set the value of quincunx_sampling_flag equal to 1to indicate that each color component plane of each constituent frame isquincunx sampled. Video encoder 20 may set the value ofquincunx_sampling_flag equal to 0 to indicate that the color componentplanes of each constituent frame are not quincunx sampled. When videoencoder 20 sets the value of frame_packing_arrangement_type is equal to0, video encoder 20 may also set the value of quincunx_sampling_flagequal to 1. When video encoder 20 sets the value offrame_packing_arrangement_type equal to 5, video encoder 20 may also setthe value of quincunx_sampling_flag equal to 0.

Video encoder 20 may set the value of content_interpretation_type toindicate the intended interpretation of the constituent frames asspecified in Table 4. Values of content_interpretation_type that do notappear in Table 4 may be reserved for future specification by ITU-T |ISO/IEC. For each specified frame packing arrangement scheme, there maybe two constituent frames (pictures), referred to in Table 4 as frame 0and frame 1.

TABLE 4 content_interpretation_type Value Example Interpretation 0Unspecified relationship between the frame packed constituent frames 1Indicates that the two constituent frames form the left and right viewsof a stereo view scene, with frame 0 being associated with the left viewand frame 1 being associated with the right view 2 Indicates that thetwo constituent frames form the right and left views of a stereo viewscene, with frame 0 being associated with the right view and frame 1being associated with the left view

Video encoder 20 may set the value of spatial_flipping_flag equal to 1when the value of frame_packing_arrangement_type is equal to 3 or 4, toindicate that one of the two constituent frames is spatially flippedrelative to its intended orientation for display or other such purposes.When frame_packing_arrangement_type is equal to 3 or 4 andspatial_flipping_flag is equal to 1, the type of spatial flipping thatis indicated may be as follows. If frame_packing_arrangement_type isequal to 3, the indicated spatial flipping is horizontal flipping.Otherwise (that is, when the value of frame_packing_arrangement_type isequal to 4), the indicated spatial flipping is vertical flipping.

When frame_packing_arrangement_type is not equal to 3 or 4, videoencoder 20 may set the value of spatial_flipping_flag equal to 0. Whenframe_packing_arrangement_type is not equal to 3 or 4, the value 1 forframe_packing_arrangement_type may be reserved for future use by ITU-T |ISO/IEC. When frame_packing_arrangement_type is not equal to 3 or 4,video decoders may ignore the value 1 for spatial_flipping_flag.

Video encoder 20 may set the value of frame0_flipped_flag equal to 1 toindicate which one of the two constituent frames is flipped. Whenspatial_flipping_flag is equal to 1, video encoder 20 may set the valueof frame0_flipped_flag equal to 0 to indicate that frame 0 is notspatially flipped and frame 1 is spatially flipped, or video encoder 20may set the value of frame0_flipped_flag equal to 1 to indicate thatframe 0 is spatially flipped and frame 1 is not spatially flipped.

When video encoder 20 sets the value of spatial_flipping_flag equal to0, video encoder 20 may set the value of frame0_flipped_flag equal to 0.When video encoder 20 sets the value of spatial_flipping_flag is equalto 0, the value 1 for spatial_flipping_flag may be reserved for futureuse by ITU-T | ISO/IEC. When spatial_flipping_flag is equal to 0, videodecoders may ignore the value of frame0_flipped_flag.

When video encoder 20 sets the value of quincunx_sampling_flag equal to0, video encoder 20 may provide spatial location reference informationto specify the location of the upper left luma sample of eachconstituent frame relative to a spatial reference point. Video encoder20 may indicate the location of chroma samples relative to luma samplesby the chroma_sample_loc_type_top_field andchroma_sample_loc_type_bottom_field syntax elements in video usabilityinformation (VUI) parameters.

Video encoder 20 may set the value of field_views_flag equal to 1 toindicate that all pictures in the current coded video sequence are codedas complementary field pairs. All fields of a particular parity may beconsidered a first constituent frame and all fields of the oppositeparity may be considered a second constituent frame. When video encoder20 does not set the value of frame_packing_arrangement_type equal to 2,video encoder 20 may set the value of field_views_flag equal to 0. Whenvideo encoder 20 does not set the value offrame_packing_arrangement_type equal to 2, the value 1 forfield_views_flag may be reserved for future use by ITU-T | ISO/IEC. Whenframe_packing_arrangement_type is not equal to 2, video decoders mayignore the value of field_views_flag.

Video encoder 20 may set the value of current_frame_is_frame0_flag equalto 1, when frame_packing_arrangement is equal to 5, to indicate that thecurrent decoded frame is constituent frame 0 and the next decoded framein output order is constituent frame 1, and the display time of theconstituent frame 0 should be delayed to coincide with the display timeof constituent frame 1. Accordingly, a video decoder, such as videodecoder 30, may delay the display time of constituent frame 0 tocoincide with the display time of constituent frame 1. Video encoder 20may set the value of current_frame_is_frame0_flag equal to 0, whenframe_packing_arrangement is equal to 5, to indicate that the currentdecoded frame is constituent frame 1 and the previous decoded frame inoutput order is constituent frame 0, and the display time of theconstituent frame 1 should not be delayed for purposes of stereo-viewpairing. Accordingly, a video decoder, such as video decoder 30, neednot delay the display time of constituent frame 1 when the value ofcurrent_frame_is_frame0_flag is equal to 0.

When video encoder 20 does not set the value offrame_packing_arrangement_type equal to 5, the constituent frameassociated with the upper-left sample of the decoded frame may beconsidered to be consitutuent frame 0 and the other constituent framemay be considered to be constituent frame 1. Whenframe_packing_arrangement_type is not equal to 5 video encoder 20 mayset the value of current_frame_is_frame0_flag equal to 0. Whenframe_packing_arrangement_type is not equal to 5, the value 1 forcurrent_frame_is_frame0_flag may be reserved for future use by ITU-T |ISO/IEC. When frame_packing_arrangement_type is not equal to 5, decodersmay ignore the value of current_frame_is_frame0_flag.

Video encoder 20 may set the value of frame0_self_contained_flag equalto 1 to indicate that no inter prediction operations within the decodingprocess for the samples of constituent frame 0 of the coded videosequence refer to samples of any constituent frame 1. Video encoder 20may set the value of frame0_self_contained_flag equal to 0 to indicatethat some inter prediction operations within the decoding process forthe samples of constituent frame 0 of the coded video sequence may ormay not refer to samples of some constituent frame 1. Whenframe_packing_arrangement_type is equal to 0 or 1, video encoder 20 mayset the value of frame0_self contained_flag equal to 0. Whenframe_packing_arrangement_type is equal to 0 or 1, the value 1 forframe0_self contained_flag may be reserved for future use by ITU-T |ISO/IEC. When frame_packing_arrangement_type is equal to 0 or 1, videodecoders may ignore the value of frame0_self contained_flag. Within acoded video sequence, video encoder 20 may set the value offrame0_self_contained_flag in all frame packing arrangement SEI messagesto the same value.

Video encoder 20 may set the value of frame1_self_contained_flag equalto 1 to indicate that no inter prediction operations within the decodingprocess for the samples of constituent frame 1 of the coded videosequence refer to samples of any constituent frame 0. Video encoder 20may set the value of frame l_self_contained_flag equal to 0 to indicatethat some inter prediction operations within the decoding process forthe samples of constituent frame 1 of the coded video sequence may ormay not refer to samples of some constituent frame 0. Whenframe_packing_arrangement_type is equal to 0 or 1, it is a requirementof bitstream conformance that frame1_self_contained_flag shall be equalto 0. When frame_packing_arrangement_type is equal to 0 or 1, the value1 for frame1_self contained_flag may be reserved for future use by ITU-T| ISO/IEC. When frame_packing_arrangement_type is equal to 0 or 1, videodecoders may ignore the value of frame l_self_contained_flag. Within acoded video sequence, video encoder 20 may set the value offrame1_self_contained_flag in all frame packing arrangement SEI messagesto the same value.

When frame0_self contained_flag is equal to 1 or framel_self_contained_flag is equal to 1, and frame_packing_arrangement_typeis equal to 2, the decoded frame may be a non-macroblock-level adaptiveframe/field (MBAFF) frame.

In some examples, video encoder 20 may set both the value offrame0_self_contained_flag equal to 1 and frame1_self contained_flagequal to 1. In this manner, video encoder 20 may signal that therespective views can be decoded and rendered separately.

Video encoder 20 may set the value of frame0_grid_position_x (whenpresent) to specify the horizontal location of the upper left sample ofconstituent frame 0 to the right of the spatial reference point in unitsof one sixteenth of the luma sample grid spacing between the samples ofthe columns of constituent frame 0 that are present in the decoded frame(prior to any upsampling for display or other purposes).

Video encoder 20 may set the value of frame0_grid_position_y (whenpresent) to specify the vertical location of the upper left sample ofconstituent frame 0 below the spatial reference point in units of onesixteenth of the luma sample grid spacing between the samples of therows of constituent frame 0 that are present in the decoded frame (priorto any upsampling for display or other purposes).

Video encoder 20 may set the value of frame1_grid_position_x (whenpresent) specifies the horizontal location of the upper left sample ofconstituent frame 1 to the right of the spatial reference point in unitsof one sixteenth of the luma sample grid spacing between the samples ofthe columns of constituent frame 1 that are present in the decoded frame(prior to any upsampling for display or other purposes)

Video encoder 20 may set the value of frame1_grid_position_y (whenpresent) specifies the vertical location of the upper left sample ofconstituent frame 1 below the spatial reference point in units of onesixteenth of the luma sample grid spacing between the samples of therows of constituent frame 1 that are present in the decoded frame (priorto any upsampling for display or other purposes).

Frame_packing_arrangement_reserved_byte may be reserved for future useby ITU-T | ISO/IEC. Video encoder 20 may set the value offrame_packing_arrangement_reserved_byte equal to 0. All other values offrame_packing_arrangement_reserved_byte may be reserved for future useby ITU-T | ISO/IEC. Video decoders may ignore (e.g., remove from thebitstream and discard) the value offrame_packing_arrangement_reserved_byte. Video encoder 20 may set thevalue of frame_packing_arrangement_repetition_period to specify thepersistence of the frame packing arrangement SEI message, which mayspecify a frame order count interval within which another frame packingarrangement SEI message with the same value offrame_packing_arrangement_id or the end of the coded video sequencevideo encoder 20 has made present in the bitstream. Video encoder 20 mayset the value of frame_packing_arrangement_repetition_period in therange of 0 to 16,384, inclusive.

Video encoder 20 may set the value offrame_packing_arrangement_repetition_period equal to 0 to specify thatthe frame packing arrangement SEI message applies to the current decodedframe only. Video encoder 20 may set the value offrame_packing_arrangement_repetition_period equal to 1 to specify thatthe frame packing arrangement SEI message persists in output order untilany of the following conditions are true: a new coded video sequencebegins, or a frame in an access unit containing a frame packingarrangement SEI message with the same value offrame_packing_arrangement_id is output having PicOrderCnt( ) greaterthan PicOrderCnt(CurrPic).

Video encoder 20 may set the value offrame_packing_arrangement_repetition_period equal to 0 or equal to 1 toindicate that another frame packing arrangement SEI message with thesame value of frame_packing_arrangement_id may or may not be present.Video encoder 20 may set the value offrame_packing_arrangement_repetition_period greater than 1 to specifythat the frame packing arrangement SEI message persists until any of thefollowing conditions are true: a new coded video sequence begins, or aframe in an access unit containing a frame packing arrangement SEImessage with the same value of frame_packing_arrangement_id is outputhaving PicOrderCnt( )greater than PicOrderCnt(CurrPic) and less than orequal toPicOrderCnt(CurrPic)+frame_packing_arrangement_repetition_period.

Video encoder 20 may set the value offrame_packing_arrangement_repetition_period greater than 1 to indicatethat another frame packing arrangement SEI message with the same valueof frame_packing_arrangement_frames_id is present for a frame in anaccess unit that is output having PicOrderCnt( ) greater thanPicOrderCnt(CurrPic) and less than or equal toPicOrderCnt(CurrPic)+frame_packing_arrangement_repetition_period, unlessthe bitstream ends or a new coded video sequence begins without outputof such a frame.

Video encoder 20 may set the value offrame_packing_arrangement_extension_flag equal to 0 to indicate that noadditional data follows within the frame packing arrangement SEImessage. In this case, video encoder 20 may set the value offrame_packing_arrangement_extension_flag equal to 0. The value 1 forframe_packing_arrangement_extension_flag may be reserved for future useby ITU-T | ISO/IEC. Video decoders may ignore the value 1 forframe_packing_arrangement_extension_flag in a frame packing arrangementSEI message and may ignore all data that follows within a frame packingarrangement SEI message after the value 1 forframe_packing_arrangement_extension_flag.

Mode select unit 40 may receive raw video data in the form of blocksfrom the view 0 picture. After encoding the view 0 picture, videoencoder 20 may downsample a view 1 picture that corresponds to the view0 picture. That is, the view 0 picture and the view 1 picture may havebeen captured at substantially the same time. After downsampling theview 1 picture, video encoder 20 may encode the view 1 picture. Videoencoder 20 may also store a decoded version of the view 0 picture inreference frame store 64, such that motion estimation unit 42 and motioncompensation unit 44 may perform inter-view prediction with respect tothe view 0 picture when encoding the view 1 picture.

Motion estimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation is the process of generating motion vectors, whichestimate motion for video blocks. A motion vector, for example, mayindicate the displacement of a predictive block within a predictivereference frame (or other coded unit) relative to the current blockbeing coded within the current frame (or other coded unit). A predictiveblock is a block that is found to closely match the block to be coded,in terms of pixel difference, which may be determined by sum of absolutedifference (SAD), sum of square difference (SSD), or other differencemetrics. A motion vector may also indicate displacement of a partitionof a macroblock. Motion compensation may involve fetching or generatingthe predictive block based on the motion vector (or displacement vector)determined by motion estimation unit 42. Again, motion estimation unit42 and motion compensation unit 44 may be functionally integrated, insome examples.

Motion estimation unit 42 may calculate a motion vector for a videoblock of an inter-coded picture by comparing the video block to videoblocks of a reference frame in reference frame store 64. Motioncompensation unit 44 may also interpolate sub-integer pixels of thereference frame, e.g., an I-frame or a P-frame. The ITU-T H.264 standardrefers to “lists” of reference frames, e.g., list 0 and list 1. List 0includes reference frames having a display order earlier than thecurrent picture, while list 1 includes reference frames having a displayorder later than the current picture. Motion estimation unit 42 comparesblocks of one or more reference frames from reference frame store 64 toa block to be encoded of a current picture, e.g., a P-picture or aB-picture. When the reference frames in reference frame store 64 includevalues for sub-integer pixels, a motion vector calculated by motionestimation unit 42 may refer to a sub-integer pixel location of areference frame. Motion estimation unit 42 sends the calculated motionvector to entropy coding unit 56 and motion compensation unit 44. Thereference frame block identified by a motion vector may be referred toas a predictive block. Motion compensation unit 44 calculates residualerror values for the predictive block of the reference frame.

Motion estimation unit 42 may be configured to perform inter-viewprediction for view 1 pictures, in which case motion estimation unit 42may calculate displacement vectors between blocks of the view 1 pictureand corresponding blocks of a reference frame of view 0. Whencalculating a displacement vector, motion estimation unit 42 may set thevalue of the motion vector relative to the position of the current blockin the reduced resolution picture separate from the asymmetric frame,rather than a position of the current block as positioned within theasymmetric packed frame.

Suppose, for example, that the position of the current block in thereduced resolution picture is (x₀, y₀). Suppose further that videoencoder 20 will pack the asymmetric frame with a top-bottom framepacking arrangement. The full resolution picture may have a height of hpixels and a width of w pixels. Accordingly, motion estimation unit 42may calculate the displacement vector relative to (x₀, 2*(y₀−h)). Asanother example, suppose instead that video encoder 20 will pack theasymmetric frame with a side-by-side frame packing arrangement. In thisexample, motion estimation unit 42 may calculate the displacement vectorrelative to (2*(x₀−w), y₀). Thus, motion estimation unit 42 maycalculate the displacement vector relative to the position of thecurrent block in the reduced resolution frame standing alone, ratherthan the position of the current block in the asymmetric frame. Motioncompensation unit 44 may calculate prediction data based on thepredictive block. Video encoder 20 forms a residual video block,indicating residual error between the pixels values of the block to becoded and the predictive block, by subtracting the prediction data frommotion compensation unit 44 from the original video block being coded.Summer 50 represents the component or components that perform thissubtraction operation.

Alternatively, video encoder 20 may be configured to encode view 1pictures as fields. Rather than encoding a pair of interlaced top andbottom fields for view 1 pictures, however, video encoder 20 may beconfigured to encode only a single field for each of the view 1pictures. Video encoder 20 may further encode the view 1 pictures asfields relative to either fields of previously coded view 1 pictures ortop or bottom fields of view 0 pictures. Each of the previously codedview 0 pictures may include both a top field and a bottom field. Itshould be understood that although video encoder 20 may be configured toencode view 1 pictures as fields, video encoder 20 may still encode view0 pictures as frames.

To encode a picture of view 1 as a field, motion estimation unit 42 maybe configured to compare the picture of view 1 to previously codedpictures of view 1 that have been decoded, or top or bottom fields ofpreviously coded view 0 pictures that have been decoded. Field codedpictures may comprise one-half of the vertical resolution of the fullresolution picture. In general, a field of a picture may comprise a topfield, comprising even-numbered lines of the picture, or a bottom field,comprising odd-numbered lines of the picture. Accordingly, to encode aview 1 picture relative to a previously coded view 1 picture that is nowdecoded, video encoder 20 may select a field for the view 1 picture(e.g., a top field or a bottom field of the view 1 picture), select apreviously coded view 1 picture that is now decoded as a referencepicture, and calculate the difference between the selected field and thereference picture. Similarly, to encode a view 1 picture relative to apreviously coded view 0 picture that is now decoded, video encoder 20may perform similar steps, but additionally determine whether to encodethe view 1 picture relative to a top field or a bottom field of the view0 reference picture.

Transform unit 52 applies a transform, such as a discrete cosinetransform (DCT), integer transform, or a conceptually similar transform,to the residual block, producing a video block comprising residualtransform coefficient values. Transform unit 52 may perform othertransforms, such as those defined by the H.264 standard, which areconceptually similar to DCT. Wavelet transforms, integer transforms,sub-band transforms or other types of transforms could also be used. Inany case, transform unit 52 applies the transform to the residual block,producing a block of residual transform coefficients. Transform unit 52may convert the residual information from a pixel value domain to atransform domain, such as a frequency domain. Quantization unit 54quantizes the residual transform coefficients to further reduce bitrate. The quantization process may reduce the bit depth associated withsome or all of the coefficients. The degree of quantization may bemodified by adjusting a quantization parameter.

Following quantization, entropy coding unit 56 entropy codes thequantized transform coefficients. For example, entropy coding unit 56may perform content adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), or another entropy codingtechnique. Following the entropy coding by entropy coding unit 56, theencoded video may be transmitted to another device or archived for latertransmission or retrieval. In the case of context adaptive binaryarithmetic coding (CABAC), context may be based on neighboringmacroblocks.

In some cases, entropy coding unit 56 or another unit of video encoder20 may be configured to perform other coding functions, in addition toentropy coding. For example, entropy coding unit 56 may be configured todetermine the CBP values for the macroblocks and partitions. Also, insome cases, entropy coding unit 56 may perform run length coding of thecoefficients in a macroblock or partition thereof. In particular,entropy coding unit 56 may apply a zig-zag scan or other scan pattern toscan the transform coefficients in a macroblock or partition and encoderuns of zeros for further compression. Entropy coding unit 56 also mayconstruct header information with appropriate syntax elements fortransmission in the encoded video bitstream.

Inverse quantization unit 58 and inverse transform unit 60 apply inversequantization and inverse transformation, respectively, to reconstructthe residual block in the pixel domain, e.g., for later use as areference block. Motion compensation unit 44 may calculate a referenceblock by adding the residual block to a predictive block of one of theframes of reference frame store 64. Motion compensation unit 44 may alsoapply one or more interpolation filters to the reconstructed residualblock to calculate sub-integer pixel values for use in motionestimation. Summer 62 adds the reconstructed residual block to themotion compensated prediction block produced by motion compensation unit44 to produce a reconstructed video block for storage in reference framestore 64. The reconstructed video block may be used by motion estimationunit 42 and motion compensation unit 44 as a reference block tointer-code a block in a subsequent video frame.

FIG. 3 is a block diagram illustrating an example of video decoder 30,which decodes an encoded video sequence. In the example of FIG. 3, videodecoder 30 includes an entropy decoding unit 70, motion compensationunit 72, intra prediction unit 74, inverse quantization unit 76, inversetransformation unit 78, reference frame store 82 and summer 80. Videodecoder 30 may, in some examples, perform a decoding pass generallyreciprocal to the encoding pass described with respect to video encoder20 (FIG. 2).

In particular, video decoder 30 may be configured to receive a bitstreamincluding asymmetric packed frames. Video decoder 30 may receiveinformation indicative of whether the bitstream includes asymmetricpacked frames, and if so, a frame packing arrangement for the asymmetricpacked frames. For example, video decoder 30 may be configured tointerpret frame packing arrangement SEI messages. Video decoder 30 mayalso be configured to determine whether to decode both pictures in anasymmetric packed frame, or only one of the two pictures, e.g., the fullresolution picture. This determination may be based on whether videodisplay 32 (FIG. 1) is able to display three-dimensional video data,whether video decoder 30 has the capability to decode two views (andupsample a reduced resolution view) of a particular bitrate and/orframerate, or other factors regarding video decoder 30 and/or videodisplay 32.

When destination device 40 is not able to decode and/or displaythree-dimensional video data from asymmetric packed frames, videodecoder 30 may unpack received asymmetric frames into constituent fullresolution encoded pictures and reduced resolution encoded pictures,then discard the reduced resolution encoded pictures. Thus, videodecoder 30 may elect to only decode the full-resolution pictures of,e.g., view 0. On the other hand, when destination device 40 is capableof decoding and displaying three-dimensional video data of asymmetricpacked frames, video decoder 30 may unpack received asymmetric framesinto constituent full and reduced resolution encoded pictures, decodethe full and reduced resolution encoded pictures, upsample the reducedresolution picture, and send the pictures to video display 32. In someexamples, video decoder 30 may receive asymmetric packed frames viaHDMI.

Video encoder 30 may further receive information indicating whether areduced resolution encoded picture of an asymmetric frame is encoded asa field or as a picture. When encoded as a picture, video encoder 30 mayretrieve displacement vectors for inter-view encoded reduced resolutionpictures, or motion vectors for intra-view, inter-prediction encodedreduced resolution pictures. Video encoder 30 may use the displacementor motion vectors to retrieve a prediction block to decode a block ofthe reduced resolution picture. After decoding the reduced resolutionpicture, video encoder 30 may upsample the decoded picture to the sameresolution as the full resolution picture of the same asymmetric frame.

Motion compensation unit 72 may generate prediction data based on motionvectors received from entropy decoding unit 70. Motion compensation unit72 may use motion vectors received in the bitstream to identify aprediction block in reference frames in reference frame store 82. Intraprediction unit 74 may use intra prediction modes received in thebitstream to form a prediction block from spatially adjacent blocks.Inverse quantization unit 76 inverse quantizes, i.e., de-quantizes, thequantized block coefficients provided in the bitstream and decoded byentropy decoding unit 70. The inverse quantization process may include aconventional process, e.g., as defined by the H.264 decoding standard.The inverse quantization process may also include use of a quantizationparameter QP_(Y) calculated by encoder 20 for each macroblock todetermine a degree of quantization and, likewise, a degree of inversequantization that should be applied.

Inverse transform unit 58 applies an inverse transform, e.g., an inverseDCT, an inverse integer transform, or a conceptually similar inversetransform process, to the transform coefficients in order to produceresidual blocks in the pixel domain. Motion compensation unit 72produces motion compensated blocks, possibly performing interpolationbased on interpolation filters. Identifiers for interpolation filters tobe used for motion estimation with sub-pixel precision may be includedin the syntax elements. Motion compensation unit 72 may useinterpolation filters as used by video encoder 20 during encoding of thevideo block to calculate interpolated values for sub-integer pixels of areference block. Motion compensation unit 72 may determine theinterpolation filters used by video encoder 20 according to receivedsyntax information and use the interpolation filters to producepredictive blocks.

Motion compensation unit 72 uses some of the syntax information todetermine sizes of macroblocks used to encode frame(s) of the encodedvideo sequence, partition information that describes how each macroblockof a frame of the encoded video sequence is partitioned, modesindicating how each partition is encoded, one or more reference frames(or lists) for each inter-encoded macroblock or partition, and otherinformation to decode the encoded video sequence.

Summer 80 sums the residual blocks with the corresponding predictionblocks generated by motion compensation unit 72 or intra-prediction unitto form decoded blocks. If desired, a deblocking filter may also beapplied to filter the decoded blocks in order to remove blockinessartifacts. The decoded video blocks are then stored in reference framestore 82, which provides reference blocks for subsequent motioncompensation and also produces decoded video for presentation on adisplay device (such as display device 32 of FIG. 1).

When a reduced resolution picture of an asymmetric frame is encoded as afield, video encoder 30 may use a top field or a bottom field of apreviously decoded picture of the other view as a reference field fordecoding the reduced resolution encoded picture. Video encoder 30 mayalso use a previously decoded reduced resolution picture of the sameview as a reference field, where the previously decoded reducedresolution pictures may be stored in reference frame store 82 prior toupsampling. In this manner, video decoder 30 may decode the reducedresolution encoded picture relative to a reduced resolution decodedpicture of the same view, or relative to a top or bottom field of a fullresolution decoded picture of the opposite view. After decoding thereduced resolution picture, video decoder 30 may store the reducedresolution decoded picture in reference frame store 82, then upsamplethe reduced resolution decoded picture to form a full resolution pictureof the corresponding view.

FIG. 4 is a conceptual diagram illustrating pictures 100, 102 of a lefteye view and a right eye view being combined by video encoder 20 to forman asymmetric packed frame 104. In this example, video encoder 20receives picture 100, including raw video data of a left eye view of ascene, and picture 102, including raw video data of a right eye view ofthe scene. The left eye view may correspond to view 0, while the righteye view may correspond to view 1. Pictures 100, 102 may correspond totwo pictures of the same temporal instance. For example, pictures 100,102 may have been captured by cameras at substantially the same time.

In the example of FIG. 4, samples of picture 100 are indicated with X's,while samples (e.g., pixels) of picture 102 are indicated with O's. Inthis example, video encoder 20 encodes picture 100, downsamples andencodes picture 102, and combines the pictures to form asymmetric packedframe 104. In this example, video encoder 20 arranges the fullresolution encoded picture for picture 100 and the reduced resolutionencoded picture for picture 102 in a top-bottom arrangement withinasymmetric packed frame 104. To downsample picture 102, video encoder 20may decimate alternate rows of picture 102. As another example, videoencoder 20 may entirely remove alternate rows of picture 102 to producea downsampled version of picture 102. As still another example, videoencoder 20 may quincunx (checkerboard) sample picture 102, and arrangethese samples in rows within asymmetric packed frame 104.

In the illustration of FIG. 4, asymmetric packed frame 104 includes X'scorresponding to data from picture 100 and O's corresponding to datafrom picture 102. However, it should be understood that the data ofasymmetric packed frame 104 corresponding to picture 102 will notnecessarily align exactly with data of picture 102 followingdownsampling. Likewise, following encoding, the data of the pictures inasymmetric packed frame 104 will likely be different than the data ofpictures 100, 102. Accordingly, it should not be assumed that the dataof one X in asymmetric packed frame 104 is necessarily identical to acorresponding X in picture 100. Similarly, it should not be assumed thatthe data of one 0 in asymmetric packed frame 104 is identical to acorresponding 0 in picture 102, or that the O's of asymmetric packedframe 104 have the same resolution as O's of picture 102.

Asymmetric packed frame 104 may correspond to a top-bottom frame packingarrangement. That is, data corresponding to picture 100 is placed on topof data corresponding to picture 102 in asymmetric packed frame 104.Although illustrated in rows, data corresponding to picture 102 inasymmetric packed frame 104 may be quincunx (checkerboard) sampled, andthus, may be upsampled using a quincunx arrangement as well.Alternatively, data corresponding to picture 102 in asymmetric packedframe 104 may be sampled from alternate rows of picture 102, in whichcase the data may be upsampled by, e.g., interpolating alternate rows ofthe data following decoding.

FIG. 5 is a conceptual diagram illustrating pictures 100, 102 of a lefteye view and a right eye view being combined by video encoder 20 to forman asymmetric packed frame 106. In this example, video encoder 20receives picture 100, including raw video data of a left eye view of ascene, and picture 102, including raw video data of a right eye view ofthe scene. Pictures 100, 102 may correspond to two pictures of the sametemporal instance. For example, pictures 100, 102 may have been capturedby cameras at substantially the same time.

In the example of FIG. 5, samples of picture 100 are indicated with X's,while samples of picture 102 are indicated with O's. In this example,video encoder 20 encodes picture 100, downsamples and encodes picture102, and combines the pictures to form asymmetric packed frame 106. Inthis example, video encoder 20 arranges the full resolution encodedpicture for picture 100 and the reduced resolution encoded picture forpicture 102 in a side-by-side arrangement within asymmetric packed frame106. To downsample picture 102, video encoder 20 may decimate alternatecolumns of picture 102. Alternatively, video encoder 20 may entirelyremove alternate columns of picture 102 to produce a downsampled versionof picture 102.

In the illustration of FIG. 5, asymmetric packed frame 106 includes X'scorresponding to data from picture 100 and O's corresponding to datafrom picture 102. However, it should be understood that the data ofasymmetric packed frame 106 corresponding to picture 102 will notnecessarily align exactly with data of picture 102 followingdownsampling. Likewise, following encoding, the data of the pictures inasymmetric packed frame 106 will likely be different than the data ofpictures 100, 102. Accordingly, it should not be assumed that the dataof one X in asymmetric packed frame 106 is necessarily identical to acorresponding X in picture 100. Similarly, it should not be assumed thatthe data of one O in asymmetric packed frame 106 is identical to acorresponding O in picture 102, or that the O's of asymmetric packedframe 106 have the same resolution as O's of picture 102.

Asymmetric packed frame 106 may correspond to a side-by-side framepacking arrangement. That is, data corresponding to picture 100 isarranged side-by-side with data corresponding to picture 102. Althoughillustrated in columns, data corresponding to picture 102 in asymmetricpacked frame 106 may be quincunx (checkerboard) sampled, and thus, maybe upsampled using a quincunx arrangement as well. Alternatively, datacorresponding to picture 102 in asymmetric packed frame 106 may besampled from alternate columns of picture 102, in which case the datamay be upsampled by, e.g., interpolating alternate columns of the datafollowing decoding.

FIG. 6 is a conceptual diagram illustrating an example process forencoding pictures 110A-110D (pictures 110) of a left eye view as frames,while encoding pictures 112A-112D (pictures 112) of a right eye view asfields. In this example, pictures 110 correspond to a left eye view(e.g., view 0), while pictures 112 correspond to a right eye view (e.g.,view 1). In general, pictures 112 may comprise downsampled pictures ofthe right eye view. For example, a video processing unit may decimaterows of incoming pictures of the right eye view to produce pictures 112.

A video encoder, such as video encoder 20, or a video preprocessing unitcoupled to the video encoder, may receive full resolution, unencodedpictures of the left eye view and the right eye view. Video encoder 20may reduce the resolution of pictures of the right eye view bydecimating the pictures of the right eye view. In this manner, videoencoder 20 may produce pictures 112 that have one-half the verticalresolution of pictures 110, but the same horizontal resolution (width)as pictures 110.

The video encoder may encode pictures 110 normally, that is, as frames.However, in this example, video encoder 20 may encode pictures 112 asfields. Video encoder 20 may encode pictures 112 relative to previouslyencoded (and subsequently decoded) pictures of the right eye view, orpreviously encoded (and subsequently decoded) pictures of the left eyeview. For example, video encoder 20 may encode picture 112 s relative toeither the top field of one of pictures 110 or the bottom field of oneof pictures 110. That is, video encoder 20 may use the top field of oneof pictures 110 as a reference field to encode one of pictures 112,e.g., by calculating differences between rows of the field of the one ofpictures 112 and alternate rows (starting with a top row) of the one ofpictures 110. Alternatively, video encoder 20 may use the bottom fieldof one of pictures 110 as a reference field, in which case video encoder20 may calculate differences between rows of the one of pictures 112 andalternate rows (starting with the row after the top row) of the one ofpictures 110. In general, video encoder 20 may encode pictures 112relative to previously coded pictures 112 and top and/or bottom fieldsof previously coded pictures 110.

Video encoder 20 may form independent access units from pictures 110 andpictures 112. Together, pictures 110A and 112A may form a stereo imagepair. Likewise, pictures 110B and 112B may form a stereo image pair,pictures 110C and 112C may form a stereo image pair, and pictures 110Dand 112D may form a stereo image pair. However, rather than forming anasymmetric frame including two images forming a stereo image pair, videoencoder 20 may form independent access units from each of pictures 110and 112. Video encoder 20 may output pictures 110 and 112 alternately,as illustrated in the example of FIG. 6. This technique may be referredto as frame field interleaved coding. Thus, video encoder 20 may form abitstream including both pictures coded as frames and pictures coded asfields, and the field coded pictures may have reduced resolutionrelative to the frame coded pictures. Moreover, the field coded picturesmay be coded relative to one or more of the frame or field codedpictures that occurs earlier in the bitstream.

Frame field interleaved coding is one example for allowing prediction ofa reduced resolution picture from a full resolution picture. By codingfull resolution pictures as frames and coding reduced resolutionpictures as fields, a relatively high coding and bitstream efficiencymay be achieved. Decoded pictures 110 may be treated as complementaryfield pairs and used as reference pictures when a reduced resolutionfield, that is, one of pictures 112, is coded. In some examples, eachpicture of one view (e.g., the left view or view 0) may be coded as aframe, while each picture of the other view (e.g., the right view orview 1) may be coded as a field. Accordingly, the view including fullresolution encoded pictures may be referred to as a full resolution viewor a high resolution view, while the view including reduced resolutionencoded pictures may be referred to as a reduced resolution view or alow resolution view.

This technique may be used as an extension of H.264/AVC in someexamples. In some examples, this technique may be used as an extensionto future coding standards, such as H.265, assuming these standardssupport both frame and field coding. Thus, these techniques do notnecessarily require new coding tools at the block level.

FIG. 7 is a conceptual diagram illustrating field encoding of a pictureto produce a reduced resolution encoded picture for inclusion in anasymmetric packed frame. FIG. 7 illustrates picture 120 as a view 0(e.g., left eye view) reference picture and picture 122 as a view 1(e.g., right eye view) picture to be coded as a field. In this example,rows of pixels of the view 0 reference picture corresponding to the topfield of picture 120 are illustrated with X's, while rows of pixels ofpicture 120 corresponding to the bottom field of picture 120 areillustrated with O's.

In this example, picture 122 is encoded as a field relative to the topfield of picture 120. Thus, rows of picture 122 may be predicted fromthe top field of picture 120. In other words, an encoder may use the topfield of picture 122 as a reference field. For each pixel in picture122, the video encoder may calculate the difference between the pixeland a collocated pixel in the corresponding row of the top field ofpicture 122. The video encoder may then encode an identifier of picture122, an indication that the top field of picture 122 was used to predictthe encoded version of picture 122, and the residual values (that is,the calculated differences between picture 122 and the top field ofpicture 120) to encode picture 122. The video encoder may then outputthe encoded version of picture 122, e.g., interleaved between twoframe-coded pictures of view 0, as shown in FIG. 6.

FIG. 8 is a conceptual diagram illustrating inter-view prediction of ablock 148 of a reduced resolution encoded picture 144 of an asymmetricpacked frame 140. FIG. 8 illustrates two example asymmetric packedframes 130, 140. Asymmetric packed frame 130 includes full resolutionencoded picture 132, corresponding to a left eye view (e.g., view 0),and reduced resolution encoded picture 134, corresponding to a right eyeview (e.g., view 1). Asymmetric packed frame 140 includes fullresolution encoded picture 142, corresponding to the left eye view, andreduced resolution encoded picture 144, corresponding to the right eyeview.

Reduced resolution encoded picture 144 includes block 148, which may beintra-view predicted, e.g., relative to block 138 of reduced resolutionencoded picture 134 of asymmetric frame 130. The example of FIG. 8illustrates motion vector 154 that indicates a location of block 138relative to block 148. Alternatively, block 148 may be inter-viewpredicted relative to, e.g., block 146 of full resolution encodedpicture 142 of asymmetric packed frame 140 (as shown by displacementvector 150) or block 136 of full resolution encoded picture 132 ofasymmetric packed frame 130 (as shown by displacement vector 152).

Displacement vector 150 may indicate the location of block 146 relativeto block 148 in full resolution encoded picture 142. Displacement vector152 may indicate the location of block 136 relative to block 148 in fullresolution encoded picture 132 of asymmetric frame 130. Displacementvector 154 (which may be considered a motion vector) may indicate thelocation of block 138 relative to block 148 in reduced resolutionencoded picture 134. In this manner, block 148 may be intra-viewinter-frame encoded, inter-view intra-frame encoded, or inter-viewinter-frame encoded. Accordingly, three encoding modes may exist:prediction of block 148 from the same view, (e.g., the right eye view)in different frames as illustrated by the example of displacement vector154, prediction of block 148 from the same frame in the other view(e.g., the left eye view) as illustrated by the example of displacementvector 150, and prediction of block 148 from a different frame and theother view (e.g., the left eye view) as illustrated by the example ofdisplacement vector 152.

As noted above, a video encoder, such as video encoder 20, may calculatedisplacement vectors 150, 152, 154 relative to the location of referenceblock 148 external to asymmetric packed frame 140. That is, displacementvectors 150, 152, 154 may be calculated relative to the location ofblock 148 as if picture 144 was not combined with picture 142, but was aseparate picture. To do so, let the location of block 148 withinasymmetric frame 140 be identified at position (x₀, y₀). Let fullresolution picture 142 have a height of h pixels and width of w pixels.

In one example, assuming that asymmetric packed frame 140 has atop-bottom packing arrangement, as illustrated in the example of FIG. 8,picture 144 may have the same width as full resolution picture 142 (thatis, a width of w pixels), but a height less than the height of fullresolution picture 142. For example, picture 144 may have a height ofh/2 pixels. In this example, displacement vectors 150, 152 may becalculated relative to location (x₀, 2*(y₀−h)). More generally, ifreduced resolution picture 144 has a height of n*h/d, displacementvectors 150, 152 may be calculated relative to location (x₀,(d/n)*(y₀−h)) and displacement vector 154 may be calculated relative tolocation (x₀, y₀).

As another example, assuming that asymmetric packed frame 140 has aside-by-side packing arrangement, picture 144 may have the same heightas full resolution picture 142 (that is, a height of h pixels), but awidth less than the width of full resolution picture 142. For example,picture 144 may have a width of w/2 pixels. In this example,displacement vectors 150, 152 may be calculated relative to location(2*(x₀−w), y₀). More generally, if reduced resolution picture 144 has awidth of n*w/d, displacement vectors 150, 152 may be calculated relativeto location ((d/n)*(x₀−w), y₀).

FIG. 9 is a flowchart illustrating an example method for combining twopictures of two different views into an asymmetric packed frame andencoding the asymmetric packed frame. Although generally described withrespect to the example components of FIGS. 1 and 2, it should beunderstood that other encoders, encoding units, and encoding devices maybe configured to perform the method of FIG. 9. Moreover, the steps ofthe method of FIG. 9 need not necessarily be performed in the ordershown in FIG. 9, and additional or alternative steps may be performed.

In the example of FIG. 9, video encoder 20 first receives a picture of aleft eye view (160), e.g., view 0. Video encoder 20 may also receive apicture of a right eye view, e.g., view 1, (162), such that the tworeceived pictures form a stereo image pair. The left eye view and theright eye view may form a stereo view pair, also referred to as acomplementary view pair. The received right eye view picture maycorrespond to the same temporal location as the received left eye viewpicture. That is, the left eye view picture and the right eye viewpicture may have been captured or generated at substantially the sametime. Video encoder 20 may then reduce the resolution of the right eyeview picture (166). In some examples, a preprocessing unit of videoencoder 20 may receive the pictures. In some examples, the videopreprocessing unit may be external to video encoder 20.

In the example of FIG. 9, video encoder 20 reduces the resolution of theright eye view picture (164). For example, video encoder 20 maysubsample the received right eye view picture (e.g., using row-wise,column-wise, or quincunx (checkerboard) subsampling), decimate rows orcolumns of the received right eye view picture, or otherwise reduce theresolution of the received right eye view picture. In some examples,video encoder 20 may produce a reduced resolution picture having eitherhalf of the width or half of the height of the full resolution pictureof the left eye view. In other examples including a video preprocessor,the video preprocessor may be configured to reduce the resolution of theright eye view picture.

Video encoder 20 may then form an asymmetric frame including both thereceived left eye view picture and the downsampled right eye viewpicture (166). For example, video encoder 20 may form an asymmetricframe having a top-bottom arrangement, assuming that the right eye viewpicture has the same width as the left eye view picture. In someexamples, video encoder 20 may form an asymmetric frame with atop-bottom arrangement in which the full resolution picture is above thereduced resolution picture, e.g., where the left eye view picture isplaced above the right eye view picture with a reduced resolution. Inother examples, video encoder 20 may form an asymmetric frame with atop-bottom arrangement in which the full resolution picture is below thereduced resolution picture, e.g., where the left eye view picture isplaced below the right eye view picture with a reduced resolution. Instill other examples, e.g., where the reduced resolution picture has thesame height but a reduced width relative to the full resolution picture,video encoder 20 may form an asymmetric frame with a side-by-sidearrangement, and the full resolution picture may be placed either to theleft or to the right of the reduced resolution picture.

Video encoder 20 may then encode the asymmetric frame (168). In someexamples, video encoder 20 may be configured to encode the right eyeview picture portion of the asymmetric frame only relative to previouslycoded data of the right eye view. Thus, video encoder 20 may encode thereduced resolution picture in an intra-prediction (I-prediction) mode,relative to other data of the same picture, or in an inter-prediction(P-prediction or B-prediction) mode, relative to data of one or morepreviously encoded pictures of the right eye view.

In other examples, video encoder 20 may be configured to encode thereduced resolution right eye view picture portion of the asymmetricframe relative to either data of the right eye view or of the left eyeview. For example, video encoder 20 may encode the reduced resolutionright eye view picture relative to the left eye view portion of theasymmetric frame. Video encoder 20 may also encode the reducedresolution right eye view portion of the asymmetric frame relative toleft eye view portions of previously encoded asymmetric frames.

Video encoder 20 may encode the reduced resolution right eye viewpicture relative to either a picture of the right eye view or relativeto a picture of the left eye view of a previously encoded asymmetricframe. Thus, video encoder 20 may encode each block of the right eyeview portion of the current asymmetric frame in an inter- mode relativeto blocks of previously encoded right eye view pictures, blocks of theleft eye view picture portion of the same asymmetric frame, or blocks ofpreviously encoded left eye view portions of previously encodedasymmetric frames. As noted above, to encode the blocks of the currentpicture, video encoder 20 may calculate displacement vectors relative tothe location of the block in the reduced resolution right eye viewpicture, rather than to the location of the block positioned within theasymmetric packed frame.

After encoding the asymmetric frame, video encoder 20 may signal whetherinter-view prediction is used to encode the right eye view picture(170). For example, video encoder 20 may generate a frame packingarrangement SEI message that indicates both whether asymmetric packedframes are present in a bitstream formed by video encoder 20, and if so,whether any of the asymmetric packed frames includes a reducedresolution picture encoded in an inter-view prediction mode.

Video encoder 20 may also signal a frame packing type for the asymmetricpacked frame (172). For example, video encoder 20 may includeinformation in the frame packing arrangement SEI message discussed aboveindicating a frame packing arrangement for the asymmetric packed frame,e.g., side-by-side or top-bottom packing. Moreover, video encoder 20 mayinclude information indicating the relative locations of the data forthe full resolution picture and the data for the reduced resolutionpicture, e.g., in the frame packing arrangement SEI message.

Video encoder 20 may then output the asymmetric frame (174). Forexample, video encoder 20, or a unit coupled to video encoder 20, maystore the asymmetric frame to a computer-readable storage medium,broadcast the asymmetric frame, transmit the asymmetric frame vianetwork transmission or network broadcast, or otherwise provide theencoded video data. In some examples, video encoder 20, or a unitcoupled to video encoder 20, may output the asymmetric frame via a highdefinition multimedia interface (HDMI).

It should also be understood that video encoder 20 need not necessarilyprovide information indicating whether a bitstream includes asymmetricpacked frames, and frame packing arrangements and indications oflocations of full and reduced resolution pictures in the frames, foreach frame of the bitstream. In some examples, video encoder 20 mayprovide a single set of information, e.g., a single frame packing SEImessage, for the entire bitstream indicating this information for eachframe of the bitstream. In some examples, video encoder 20 may providethe information periodically, e.g., after each video fragment, group ofpictures (GOP), video segment, every certain number of frames, or atother periodic intervals. Video encoder 20, or another unit associatedwith video encoder 20, may also provide the frame packing arrangementSEI message on demand in some examples. e.g., in response to a requestfrom a client device for the frame packing arrangement SEI message or ageneral request for header data of the bitstream.

FIG. 10 is a flowchart illustrating an example method for decoding anasymmetric frame. Although generally described with respect to theexample components of FIGS. 1 and 3, it should be understood that otherdecoders, decoding units, and decoding devices may be configured toperform the method of FIG. 10. Moreover, the steps of the method of FIG.10 need not necessarily be performed in the order shown in FIG. 10, andadditional or alternative steps may be performed.

Initially, video decoder 30 may receive an asymmetric frame (200). Insome examples, video decoder 30, or a unit coupled to video decoder 30,may receive the asymmetric frame via a high definition multimediainterface (HDMI). Video decoder 30 may then determine a frame packingtype for the asymmetric frame (202). For example, video decoder 30 mayreceive a frame packing arrangement SEI message indicating the framepacking type for the asymmetric frame (e.g., top-bottom orside-by-side), as well as locations of a full resolution picture and areduced resolution picture in the asymmetric frame. In some examples,video decoder 30 may have previously received a frame packingarrangement SEI message for the bitstream, prior to receiving theasymmetric frame, in which case video decoder 30 may have determined theframe packing type for frames of the bitstream (including the mostrecently received asymmetric frame) prior to receiving the asymmetricframe.

Based on the frame packing type information, video decoder 30 may decodethe asymmetric frame (204). Video decoder 30 may first decode the lefteye view portion of the asymmetric frame, followed by the right eye viewportion of the asymmetric frame. Video decoder 30 may determine thelocations of the left eye view and right eye view portions of theasymmetric frame based on the frame packing type information. In someexamples, video decoder 30 may decode the right eye view picturerelative to a left eye view picture.

After decoding the asymmetric frame, video decoder 30 may separate thedecoded frame into constituent pictures, e.g., the left eye view pictureand the right eye view picture (206). Video decoder 30 may store a copyof the left eye view picture for reference to decode other left eye viewpictures and, in some examples, right eye view pictures. Video decoder30 may also store a copy of the decoded right eye view picture, e.g.,before upsampling, for use as a reference picture for decoding right eyeview portions of subsequently received asymmetric frames.

Maintaining the example above, the right eye view picture may also havea reduced resolution, although in other examples the right eye viewpicture may have full resolution and the left eye view picture may havereduced resolution. Accordingly, video decoder 30 may upsample the righteye view picture (208), e.g., by interpolating missing information toform a full resolution version of the right eye view picture. In thismanner, video decoder 30 may form a right eye view picture having thesame resolution as the left eye view picture. Video decoder 30 may thensend the decoded left and right eye view pictures to video display 32,which may display the left and right eye view pictures simultaneously ornearly simultaneously (212).

FIG. 11 is a flowchart illustrating an example method for performingframe field interleaved coding in accordance with the techniques of thisdisclosure. Although generally described with respect to the examplecomponents of FIGS. 1 and 2, it should be understood that otherencoders, encoding units, and encoding devices may be configured toperform the method of FIG. 11. Moreover, the steps of the method of FIG.9 need not necessarily be performed in the order shown in FIG. 11, andadditional or alternative steps may be performed.

Initially, video encoder 20 may receive a left eye view picture, e.g., apicture of view 0 (220). Video encoder 20 may then encode the left eyeview picture (222), e.g., as a frame in either an intra- or aninter-prediction mode. Thus, video encoder 20 may encode the left eyeview picture relative to other data of the same picture, or relative toone or more reference pictures of the left eye view.

Video encoder 20 may also receive a picture of a right eye view, e.g.,view 1, (224), such that the right eye view picture and the left eyeview picture form a stereo image pair. The left eye view and the righteye view may form a stereo view pair, also referred to as acomplementary view pair. The received right eye view picture maycorrespond to the same temporal location as the received left eye viewpicture. That is, the left eye view picture and the right eye viewpicture may have been captured or generated at substantially the sametime. Video encoder 20 may then reduce the resolution of the right eyeview picture (226). In some examples, a video preprocessing unit ofvideo encoder 20 may receive the right eye view picture and reduce theresolution of the right eye view picture prior to encoding. In someexamples, the video preprocessing unit may be external to video encoder20.

To reduce the resolution of the right eye view picture, video encoder 20(or a video preprocessing unit) may decimate the right eye view picture,in some examples. In this manner, video encoder 20 may reduce theresolution of the right eye view picture, in this example, which mayhave one-half the vertical resolution of the left eye view picture.

Video encoder 20 may then encode the reduced resolution picture of theright eye view picture based on a picture of the left eye view (228).That is, video encoder 20 may use a previously coded left eye viewpicture as a reference picture for encoding the right eye view picture.Although in some cases video encoder 20 may use the left eye viewpicture encoded at step 222 as a reference picture for encoding theright eye view picture, in general, video encoder 20 may use anypreviously encoded picture of the left eye view as a reference picture.Thus, video encoder 20 is not limited to using the left eye view pictureencoded at step 222 as the reference picture for encoding the right eyeview picture. In some examples, video encoder 20 may use a previouslyencoded right eye view picture as the reference picture for encoding thecurrent right eye view picture. That is, video encoder 20 may determinewhether to use a previously encoded left eye view picture or apreviously encoded right eye view picture as a reference picture forencoding the current right eye view picture. Furthermore, in someexamples, video encoder 20 may select between intra and inter-modeencode of the current right eye view picture.

Video encoder 20 may encode the right eye view picture as a field.Accordingly, to encode the right eye view picture, video encoder 20 maycalculate the difference between rows of the right eye view picture andalternate rows of the referenced left eye view picture. In this manner,video encoder 20 may encode the right eye view picture as a fieldreferring to either a top field or a bottom field of a previouslyencoded left eye view picture.

Video encoder 20 may then output the encoded left eye view picture (230)and the encoded right eye view picture (232). In this example, videoencoder 20 may output the encoded pictures into the same bitstream asseparate access units, rather than forming an asymmetric packed frame.The bitstream may therefore include full resolution encoded pictures ofthe left eye view and reduced resolution encoded pictures of the righteye view, where the left eye view pictures are encoded as frames and theright eye view pictures are encoded as fields. The bitstream mayresemble the illustration of FIG. 6, such that the bitstream is framefield interleaved encoded.

FIG. 12 is a flowchart illustrating an example method for decoding aframe field interleaved coded bitstream in accordance with thetechniques of this disclosure. Although generally described with respectto the example components of FIGS. 1 and 3, it should be understood thatother decoders, decoding units, and decoding devices may be configuredto perform the method of FIG. 12. Moreover, the steps of the method ofFIG. 10 need not necessarily be performed in the order shown in FIG. 12,and additional or alternative steps may be performed.

Video decoder 30 may be configured to receive and decode a frame fieldinterleaved encoded bitstream. Accordingly, video decoder 30 may receivean encoded picture of a left eye view, e.g., view 0 (240). Video decoder30 may then decode the left eye view picture (242). Video decoder 30 mayalso receive an encoded picture of a right eye view, e.g., view 1 (244).The left eye view and right eye view may form a stereo view pair, alsoreferred to as a complementary view pair. In this example, the left eyeview picture and the right eye view picture may form independent accessunits, even though the two pictures may correspond to the same temporalperiod. For example, the two pictures may have been captured nearlysimultaneously, such that the two pictures form a stereo image pair forproducing a three-dimensional video playback.

Video decoder 30 may decode the right eye view picture based on apreviously decoded left eye view picture (246). That is, video decoder30 may use a left eye view picture as a reference picture when decodingthe right eye view picture. Although the reference picture may comprisethe picture decoded at step 242, the reference picture may generallycomprise any previously decoded picture of the left eye view. To decodethe right eye view picture, video decoder 30 may add values of rows ofthe received, encoded right eye view picture to alternate rows of thereference picture, e.g., a top field or a bottom field of the referencepicture. The bitstream may include information indicating a referencepicture for the right eye view picture, as well as whether to use thetop field or the bottom field as the reference field for decoding theright eye view picture. In other examples, video decoder 30 may furtherbe configured to determine whether to decode the right eye view picturerelative to a top field or a bottom field of a left eye view picture, orrelative to a previously decoded right eye view picture.

After decoding the right eye view picture, video decoder 30 may upsamplethe decoded right eye view picture (248). For example, video decoder 30may be configured to interpolate missing rows of information of thedecoded right eye view picture. Video decoder 30 may output the decodedleft eye view picture (250) and the decoded and upsampled right eye viewpicture (252). For example, video decoder 30 may send the decodedpictures to a display, which may display the pictures simultaneously ornearly simultaneously.

In some examples, video decoder 30 may be included within a device thatis not capable of three-dimensional video playback. In such examples,video decoder 30 may simply decode the left eye view pictures and skip(e.g., discard) the right eye view pictures. In this manner, devices maybe capable of receiving and decoding a frame field interleaved encodedbitstream whether or not the devices are capable of decoding and/orrendering three-dimensional video data.

Although generally described with respect to a video encoder and a videodecoder, the techniques of this disclosure may be implemented in otherdevices and coding units. For example, the techniques for forming anasymmetric packed frame may be performed by a transcoder configured toreceive two separate, complementary bitstreams and to transcode the twobitstreams to form a single bitstream including asymmetric packedframes. As another example, the techniques for disassembling anasymmetric packed frame may be performed by a transcoder configured toreceive a bitstream including asymmetric packed frames and to producetwo separate bitstreams corresponding to respective views of theasymmetric packed frame, each including encoded video data for arespective view.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over as oneor more instructions or code on a computer-readable medium and executedby a hardware-based processing unit. Computer-readable media may includecomputer-readable storage media, which corresponds to a tangible mediumsuch as data storage media, or communication media including any mediumthat facilitates transfer of a computer program from one place toanother, e.g., according to a communication protocol. In this manner,computer-readable media generally may correspond to (1) tangiblecomputer-readable storage media which is non-transitory or (2) acommunication medium such as a signal or carrier wave. Data storagemedia may be any available media that can be accessed by one or morecomputers or one or more processors to retrieve instructions, codeand/or data structures for implementation of the techniques described inthis disclosure. A computer program product may include acomputer-readable medium.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transient media, but areinstead directed to non-transient, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and blu-ray disc wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore digital signal processors (DSPs), general purpose microprocessors,application specific integrated circuits (ASICs), field programmablelogic arrays (FPGAs), or other equivalent integrated or discrete logiccircuitry. Accordingly, the term “processor,” as used herein may referto any of the foregoing structure or any other structure suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated hardware and/or software modules configured for encoding anddecoding, or incorporated in a combined codec. Also, the techniquescould be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (e.g., a chip set). Various components,modules, or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.Various examples have been described. These and other examples arewithin the scope of the following claims.

1. A method of encoding video data, the method comprising: encoding afirst picture of a first view of a scene to produce an encoded picturewith a first resolution; encoding at least a portion of a second pictureof a second view of the scene relative to a reference picture of thefirst view to produce an encoded picture with a reduced resolutionrelative to the first resolution; and outputting the encoded firstresolution picture and the encoded reduced resolution picture in acommon bitstream.
 2. The method of claim 1, wherein encoding the atleast portion of the second picture comprises encoding the secondpicture as a field, and wherein outputting the encoded first resolutionpicture and the encoded reduced resolution picture comprises outputtingthe encoded first resolution picture and the encoded reduced resolutionpicture as distinct access units.
 3. The method of claim 2, whereinencoding the second picture as a field comprises encoding the secondpicture relative to at least one of a top field of a complementary fieldpair of the reference picture of the first view and a bottom field ofthe complementary field pair of the reference picture of the first view.4. The method of claim 1, further comprising: receiving the firstpicture and the second picture, wherein when the second picture isreceived, the second picture comprises a reduced resolution relative tothe first resolution; and forming an asymmetric frame comprising thefirst picture and the second picture, wherein encoding the first picturecomprises encoding the asymmetric frame, wherein encoding the at leastportion of the second picture comprises encoding the asymmetric frame,and wherein outputting the encoded first resolution picture and theencoded reduced resolution picture comprises outputting the encodedasymmetric frame.
 5. The method of claim 4, wherein encoding the atleast portion of the second picture comprises: encoding a block of thesecond picture relative to a reference block of the reference picture ofthe first view; and calculating a displacement vector that indicates alocation of the reference block relative to the encoded block.
 6. Themethod of claim 5, wherein the first resolution picture comprises aheight of h pixels, wherein the block comprises a position of (x₀, y₀)in the reduced resolution picture of the asymmetric frame, and whereinwhen the asymmetric frame comprises a top-bottom frame packingarrangement, calculating the displacement vector comprises calculatingthe displacement vector pointing to the reference picture relative toposition (x₀, 2*(y₀−h)).
 7. The method of claim 5, wherein the firstresolution picture comprises a width of w pixels, wherein the blockcomprises a position of (x₀, y₀) in the reduced resolution picture ofthe asymmetric frame, and wherein when the asymmetric frame comprises aside-by-side frame packing arrangement, calculating the displacementvectors comprises calculating the displacement vector pointing to thereference picture relative to position (2*(x₀−w), y₀).
 8. An apparatusfor encoding video data, the apparatus comprising a video encoderconfigured to encode a first picture of a first view of a scene toproduce an encoded picture with a first resolution, encode at least aportion of a second picture of a second view of the scene relative to areference picture of the first view to produce an encoded picture with areduced resolution relative to the first resolution, and output theencoded first resolution picture and the encoded reduced resolutionpicture in a common bitstream.
 9. The apparatus of claim 8, wherein toencode the at least portion of the second picture, the video encoder isconfigured to encode the second picture as a field relative to at leastone of a top field of a complementary field pair of the referencepicture of the first view and a bottom field of the complementary fieldpair of the reference picture of the first view, and wherein to outputthe encoded first resolution picture and the encoded reduced resolutionpicture, the video encoder is configured to output the encoded firstresolution picture and the encoded reduced resolution picture asdistinct access units.
 10. The apparatus of claim 8, wherein the videoencoder is configured to receive the first picture and the secondpicture, wherein when the second picture is received, the second picturecomprises a reduced resolution relative to the first resolution, and toform an asymmetric frame comprising the first picture and the secondpicture, wherein to encode the first picture, the video encoder isconfigured to encode the asymmetric frame, wherein to encode the atleast portion of the second picture, the video encoder is configured toencode the asymmetric frame, and wherein to output the encoded firstresolution picture and the encoded reduced resolution picture, the videoencoder is configured to output the encoded asymmetric frame.
 11. Theapparatus of claim 10, wherein to encode the at least portion of thesecond picture, the video encoder is configured to encode a block of thesecond picture relative to a reference block of the reference picture ofthe first view and to calculate a displacement vector that indicates alocation of the reference block relative to the encoded block.
 12. Theapparatus of claim 11, wherein the first resolution picture comprises aheight of h pixels, wherein the block comprises a position of (x₀, y₀)in the reduced resolution picture of the asymmetric frame, and whereinthe video encoder is configured to calculate the displacement vectorpointing to the reference picture relative to position (x₀, 2*(y₀−h))when the asymmetric frame comprises a top-bottom frame packingarrangement.
 13. The apparatus of claim 11, wherein the first resolutionpicture comprises a width of w pixels, wherein the block comprises aposition of (x₀, y₀) in the reduced resolution picture of the asymmetricframe, and wherein the video encoder is configured to calculate thedisplacement vector pointing to the reference picture relative toposition (2*(x₀−w), y₀) when the asymmetric frame comprises aside-by-side frame packing arrangement.
 14. The apparatus of claim 8,wherein the apparatus comprises at least one of: an integrated circuit;a microprocessor; and a wireless communication device that includes thevideo encoder.
 15. An apparatus for encoding video data, the apparatuscomprising: means for encoding a first picture of a first view of ascene to produce an encoded picture with a first resolution; means forencoding at least a portion of a second picture of a second view of thescene relative to a reference picture of the first view to produce anencoded picture with a reduced resolution relative to the firstresolution; and means for outputting the encoded first resolutionpicture and the encoded reduced resolution picture in a commonbitstream.
 16. The apparatus of claim 15, wherein the means for encodingthe at least portion of the second picture comprises means for encodingthe second picture as a field relative to at least one of a top field ofa complementary field pair of the reference picture of the first viewand a bottom field of the complementary field pair of the referencepicture of the first view, and wherein the means for outputting theencoded first resolution picture and the encoded reduced resolutionpicture comprises means for outputting the encoded first resolutionpicture and the encoded reduced resolution picture as distinct accessunits.
 17. The apparatus of claim 15, further comprising: means forreceiving the first picture and the second picture, wherein when thesecond picture is received, the second picture comprises a reducedresolution relative to the first resolution; and means for forming anasymmetric frame comprising the first picture and the second picture,wherein the means for encoding the first picture comprises means forencoding the asymmetric frame, wherein the means for encoding the atleast portion of the second picture comprises the means for encoding theasymmetric frame, and wherein the means for outputting the encoded firstresolution picture and the encoded reduced resolution picture comprisesmeans for outputting the encoded asymmetric frame.
 18. The apparatus ofclaim 17, wherein the means for encoding the at least portion of thesecond picture comprises: means for encoding a block of the secondpicture relative to a reference block of the reference picture of thefirst view; and means for calculating a displacement vector thatindicates a location of the reference block relative to the encodedblock.
 19. The apparatus of claim 18, wherein the first resolutionpicture comprises a height of h pixels, wherein the block comprises aposition of (x₀, y₀) in the reduced resolution picture of the asymmetricframe, and wherein the means for calculating the displacement vectorcomprises means for calculating the displacement vector pointing to thereference picture relative to position (x₀, 2*(y₀−h)) when theasymmetric frame comprises a top-bottom frame packing arrangement. 20.The apparatus of claim 18, wherein the first resolution picturecomprises a width of w pixels, wherein the block comprises a position of(x₀, y₀) in the reduced resolution picture of the asymmetric frame, andwherein the means for calculating the displacement vector comprisesmeans for calculating the displacement vector pointing to the referencepicture relative to position (2*(x₀−w), y₀) when the asymmetric framecomprises a side-by-side frame packing arrangement.
 21. A computerprogram product comprising a computer-readable storage medium havingstored thereon instructions that, when executed, cause a processor of adevice for encoding video data to: encode a first picture of a firstview of a scene to produce an encoded picture with a first resolution;encode at least a portion of a second picture of a second view of thescene relative to a reference picture of the first view to produce anencoded picture with a reduced resolution relative to the firstresolution; and output the encoded first resolution picture and theencoded reduced resolution picture in a common bitstream.
 22. Thecomputer program product of claim 21, wherein the instructions thatcause the processor to encode the at least portion of the second picturecomprise instructions that cause the processor to encode the secondpicture as a field relative to at least one of a top field of acomplementary field pair of the reference picture of the first view anda bottom field of the complementary field pair of the reference pictureof the first view, and wherein the instructions that cause the processorto output the encoded first resolution picture and the encoded reducedresolution picture comprise instructions that cause the processor tooutput the encoded first resolution picture and the encoded reducedresolution picture as distinct access units.
 23. The computer programproduct of claim 21, further comprising instructions that cause theprocessor to: receive the first picture and the second picture, whereinwhen the second picture is received, the second picture comprises areduced resolution relative to the first resolution; and form anasymmetric frame comprising the first picture and the second picture,wherein the instructions that cause the processor to encode the firstpicture comprise instructions that cause the processor to encode theasymmetric frame, wherein the instructions that cause the processor toencode the at least portion of the second picture comprise theinstructions that cause the processor to encode the asymmetric frame,and wherein the instructions that cause the processor to output theencoded first resolution picture and the encoded reduced resolutionpicture comprise instructions that cause the processor to output theencoded asymmetric frame.
 24. The computer program product of claim 23,wherein the instructions that cause the processor to encode the at leastportion of the second picture comprise instructions that cause theprocessor to: encode a block of the second picture relative to areference block of the reference picture of the first view; andcalculate a displacement vector that indicates a location of thereference block relative to the encoded block.
 25. The computer programproduct of claim 24, wherein the first resolution picture comprises aheight of h pixels, wherein the block comprises a position of (x₀, y₀)in the reduced resolution picture of the asymmetric frame, and whereinthe instructions that cause the processor to calculate the displacementvector comprise instructions that cause the processor to calculate thedisplacement vector pointing to the reference picture relative toposition (x₀, 2*(y₀−h)) when the asymmetric frame comprises a top-bottomframe packing arrangement.
 26. The computer program product of claim 24,wherein the first resolution picture comprises a width of w pixels,wherein the block comprises a position of (x₀, y₀) in the reducedresolution picture of the asymmetric frame, and wherein the instructionsthat cause the processor to calculate the displacement vector compriseinstructions that cause the processor to calculate the displacementvector pointing to the reference picture relative to position (2*(x₀−w),y₀) when the asymmetric frame comprises a side-by-side frame packingarrangement.
 27. A method of decoding video data, the method comprising:receiving, from a common bitstream, a first resolution encoded pictureof a first view of a scene and a reduced resolution encoded picture of asecond view of the scene, wherein the reduced resolution encoded picturehas a reduced resolution relative to the first resolution; decoding thefirst resolution encoded picture to produce a first decoded picture;decoding at least a portion of the reduced resolution encoded picturerelative to a reference picture of the first view; upsampling thereduced resolution picture to produce a second decoded picture of thescene with the first resolution; and outputting the first decodedpicture and the second decoded picture, wherein the first decodedpicture and the second decoded picture form a stereo image pair.
 28. Themethod of claim 27, wherein the reduced resolution picture comprises anencoded field, and wherein the first resolution encoded picture and thereduced resolution encoded picture comprise distinct access units, andwherein decoding the reduced resolution picture comprises decoding theencoded field relative to at least one of a top field of a complementaryfield pair of the reference picture of the first view and a bottom fieldof the complementary field pair of the reference picture of the firstview.
 29. The method of claim 27, wherein receiving the first resolutionencoded picture and the reduced resolution encoded picture comprisesreceiving an encoded asymmetric frame comprising the first resolutionencoded picture and the reduced resolution encoded picture, whereindecoding the first resolution encoded picture comprises decoding theasymmetric frame, and wherein decoding the reduced resolution encodedpicture comprises decoding the asymmetric frame, the method furthercomprising separating the decoded asymmetric frame into the firstdecoded picture and the reduced resolution picture.
 30. The method ofclaim 29, wherein decoding the at least portion of the reducedresolution picture comprises: receiving a displacement vector thatindicates a location of a reference block relative to an encoded blockof the reduced resolution picture; and decoding the encoded block of thereduced resolution picture relative to the reference block of thereference picture of the first view.
 31. The method of claim 30, whereinthe first resolution picture comprises a height of h pixels, wherein theblock comprises a position of (x₀, y₀) in the reduced resolution pictureof the asymmetric frame, and wherein the asymmetric frame comprises atop-bottom frame packing arrangement, the method further comprisingdetermining the location of the reference block using the displacementvector relative to position (x₀, 2*(y₀−h)).
 32. The method of claim 30,wherein the first resolution picture comprises a width of w pixels,wherein the block comprises a position of (x₀, y₀) in the reducedresolution picture of the asymmetric frame, and wherein the asymmetricframe comprises a side-by-side frame packing arrangement, the methodfurther comprising determining the location of the reference block usingthe displacement vector pointing to the reference picture relative toposition (2*(x₀−w), y₀).
 33. An apparatus for decoding video data, theapparatus comprising a video decoder configured to receive, from acommon bitstream, a first resolution encoded picture of a first view ofa scene and a reduced resolution encoded picture of a second view of thescene, wherein the reduced resolution encoded picture has a reducedresolution relative to the first resolution, decode the first resolutionencoded picture to produce a first decoded picture, decode at least aportion of the reduced resolution encoded picture relative to areference picture of the first view, upsample the reduced resolutionpicture to produce a second decoded picture of the scene with the firstresolution, and output the first decoded picture and the second decodedpicture, wherein the first decoded picture and the second decodedpicture form a stereo image pair.
 34. The apparatus of claim 33, whereinthe reduced resolution picture comprises an encoded field, and whereinthe first resolution encoded picture and the reduced resolution encodedpicture comprise distinct access units, and wherein to decode thereduced resolution picture, the video decoder is configured to decodethe encoded field relative to at least one of a top field of acomplementary field pair of the reference picture of the first view anda bottom field of the complementary field pair of the reference pictureof the first view.
 35. The apparatus of claim 33, wherein the videodecoder is configured to receive an asymmetric frame comprising thefirst resolution encoded picture and the reduced resolution encodedpicture, wherein to decode the first resolution encoded picture, thevideo decoder is configured to decode the asymmetric frame, wherein todecode the reduced resolution encoded picture, the video decoder isconfigured to decode the asymmetric frame, and wherein the video decoderis configured to separate the decoded asymmetric frame into the firstdecoded picture and the reduced resolution picture.
 36. The apparatus ofclaim 35, wherein to decode the at least portion of the reducedresolution picture, the video decoder is configured to receive adisplacement vector that indicates a location of a reference blockrelative to an encoded block of the reduced resolution picture, anddecode the encoded block of the reduced resolution picture relative tothe reference block of the reference picture of the first view.
 37. Theapparatus of claim 36, wherein the first resolution picture comprises aheight of h pixels, wherein the block comprises a position of (x₀, y₀)in the reduced resolution picture of the asymmetric frame, and whereinthe asymmetric frame comprises a top-bottom frame packing arrangement,and wherein the video decoder is configured to determine the location ofthe reference block using the displacement vector pointing to thereference picture relative to position (x₀, 2*(y₀−h)).
 38. The apparatusof claim 36, wherein the first resolution picture comprises a width of wpixels, wherein the block comprises a position of (x₀, y₀) in theasymmetric frame, and wherein the asymmetric frame comprises aside-by-side frame packing arrangement, and wherein the video decoder isconfigured to determine the location of the reference block using thedisplacement vector pointing to the reference picture relative toposition (2*(x₀−w), y₀).
 39. The apparatus of claim 33, wherein theapparatus comprises at least one of: an integrated circuit; amicroprocessor; and a wireless communication device that includes thevideo decoder.
 40. An apparatus for decoding video data, the apparatuscomprising: means for receiving, from a common bitstream, a firstresolution encoded picture of a first view of a scene and a reducedresolution encoded picture of a second view of the scene, wherein thereduced resolution encoded picture has a reduced resolution relative tothe first resolution; means for decoding the first resolution encodedpicture to produce a first decoded picture; means for decoding at leasta portion of the reduced resolution encoded picture relative to areference picture of the first view; means for upsampling the reducedresolution picture to produce a second decoded picture of the scene withthe first resolution; and means for outputting the first decoded pictureand the second decoded picture, wherein the first decoded picture andthe second decoded picture form a stereo image pair.
 41. The apparatusof claim 40, wherein the reduced resolution picture comprises an encodedfield, wherein the first resolution encoded picture and the reducedresolution encoded picture comprise distinct access units, and whereinthe means for decoding the reduced resolution picture comprises meansfor decoding the encoded field relative to at least one of a top fieldof a complementary field pair of the reference picture of the first viewand a bottom field of the complementary field pair of the referencepicture of the first view.
 42. The apparatus of claim 40, wherein themeans for receiving the first resolution encoded picture and the reducedresolution encoded picture comprises means for receiving an encodedasymmetric frame comprising the first resolution encoded picture and thereduced resolution encoded picture, wherein the means for decoding thefirst resolution encoded picture comprises means for decoding theasymmetric frame, and wherein the means for decoding the reducedresolution encoded picture comprises the means for decoding theasymmetric frame, the apparatus further comprising means for separatingthe decoded asymmetric frame into the first decoded picture and thereduced resolution picture.
 43. The apparatus of claim 42, wherein themeans for decoding the at least portion of the reduced resolutionpicture comprises: means for receiving a displacement vector thatindicates a location of a reference block relative to an encoded blockof the reduced resolution picture; and means for decoding the encodedblock of the reduced resolution picture relative to the reference blockof the reference picture of the first view.
 44. The apparatus of claim43, wherein the first resolution picture comprises a height of h pixels,wherein the block comprises a position of (x₀, y₀) in the reducedresolution picture of the asymmetric frame, and wherein the asymmetricframe comprises a top-bottom frame packing arrangement, furthercomprising means for determining the location of the reference blockusing the displacement vector pointing to the reference picture relativeto position (x₀, 2*(y₀−h)).
 45. The apparatus of claim 43, wherein thefirst resolution picture comprises a width of w pixels, wherein theblock comprises a position of (x₀, y₀) in the reduced resolution pictureof the asymmetric frame, and wherein the asymmetric frame comprises aside- by-side frame packing arrangement, further comprising means fordetermining the location of the reference block using the displacementvector pointing to the reference picture relative to position (2*(x₀−w),y₀).
 46. A computer program product comprising a computer-readablestorage medium having stored thereon instructions that, when executed,cause a processor of a device for decoding video data to: receive, froma common bitstream, a first resolution encoded picture of a first viewof a scene and a reduced resolution encoded picture of a second view ofthe scene; decode the first resolution encoded picture to produce afirst decoded picture; decode at least a portion of the reducedresolution encoded picture relative to a reference picture of the firstview; upsample the reduced resolution picture to produce a seconddecoded picture of the scene with the first resolution; and output thefirst decoded picture and the second decoded picture, wherein the firstdecoded picture and the second decoded picture form a stereo image pair.47. The computer program product of claim 46, wherein the reducedresolution picture comprises an encoded field, and wherein theinstructions that cause the processor to decode the reduced resolutionpicture comprise instructions that cause the processor to decode theencoded field relative to at least one of a top field of a complementaryfield pair of the reference picture of the first view and a bottom fieldof the complementary field pair of the reference picture of the firstview.
 48. The computer program product of claim 46, wherein theinstructions that cause the processor to receive the first resolutionencoded picture and the reduced resolution encoded picture compriseinstructions that cause the processor to receive an encoded asymmetricframe comprising the first resolution encoded picture and the reducedresolution encoded picture, wherein the instructions that cause theprocessor to decode the first resolution encoded picture compriseinstructions that cause the processor to decode the asymmetric frame,and wherein the instructions that cause the processor to decode thereduced resolution encoded picture comprise the instructions that causethe processor to decode the asymmetric frame, further comprisinginstructions that cause the processor to separate the decoded asymmetricframe into the first decoded picture and the reduced resolution picture.49. The computer program product of claim 48, wherein the instructionsthat cause the processor to decode the at least portion of the reducedresolution picture comprise instructions that cause the processor to:receive a displacement vector that indicates a location of a referenceblock relative to an encoded block of the reduced resolution picture;and decode the encoded block of the reduced resolution picture relativeto the reference block of the reference picture of the first view. 50.The computer program product of claim 49, wherein the first resolutionpicture comprises a height of h pixels, wherein the block comprises aposition of (x₀, y₀) in the reduced resolution picture of the asymmetricframe, and wherein the asymmetric frame comprises a top-bottom framepacking arrangement, further comprising instructions that cause theprocessor to determine the location of the reference block using thedisplacement vector pointing to the reference picture relative toposition (x₀, 2*(y₀−h)).
 51. The computer program product of claim 49,wherein the first resolution picture comprises a width of w pixels,wherein the block comprises a position of (x₀, y₀) in the reducedresolution picture of the asymmetric frame, and wherein the asymmetricframe comprises a side-by-side frame packing arrangement, furthercomprising instructions that cause the processor to determine thelocation of the reference block using the displacement vector pointingto the reference picture relative to position (2*(x₀−w), y₀).